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MULTIMERIC GELSOLIN FUSION CONSTRUCTS 

TECHNICAL FIELD OF INVENTION 

This invention relates to multimeric and 
5 hetero-multimeric gelsolin fusion constructs, 

compositions containing them and methods using them. 
More particularly, this invention relates to multimeric 
gelsolin fusion constructs in which at least two 
gelsolin fusion polypeptides are bound to vesicles 

10 containing polyphosphoinositides. This invention also 
relates to gelsolin fusion polypeptides waich comprise 
gelsolin moieties linked to functional moieties and, in 
particular, to CD4-gelsolin fusion polypeptides 
comprising an amino acid sequence for a human CD4 

15 protein linked to a gelsolin moiety. 

BACKGROUND ART 

The rapid development of biotechnologies has 
led to novel delivery and carrier systems for 
pharmaceuticals, vaccines, diagnostics anl other 

20 bioactive molecules. Optimally, these systems enhance 
the properties of the molecules they carry, complement 
those molecules with characteristics they lack and 
combine useful characteristics of different molecules. 
Of particular interest to researchers are the serum 

25 half -life of bioactive molecules, their affinity for 
target particles and cells, targetability of bioactive 
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molecules, bioactivity, immunogenic ity and the ability 
to administer or deliver several molecules 
simultaneously. Scientists are seeking to identify new 
molecules, including proteins, that they can 
5 advantageously develop into these systems. 

Gelsolin is a protein found in mammals and 
other vertebrates [H.L. Yin and T.P. Stossel, "Control 
of Cytoplasmic Actin Gel-sol Transformation by 
Gelsolin, a Calcium-dependent Regulatory Protein", 

10 Nature, 281 , pp. 583-86 (1979) ; F.S. Southwick and M.J. 
DiNubile, "Rabbit Alveolar Macrophages Contain a Ca 2+ - 
sensitive, 41 , 000-dalton Protein Which Reversibly 
Blocks the 'Barbed 1 Ends of Actin Filaments but Does 
not Sever Them", J. Biol. Chem. , 261 . pp. 14191-95 

15 (1986); T. Ankenbauer et al. , "Proteins Regulating 

Actin Assembly in Oogenesis and Early Embryogenesis of 
Xenoous laevis ; Gelsolin Is the Major Cytoplasmic 
Actin-binding Protein", J. Cell Biol . . 107, pp. 1489-98 

(1988) ; H.L. Yin et al. , "Identification of Gelsolin, a 
20 Ca 2 *-dependent Regulatory Protein of Actin Gel-sol 

Transformation and Its Intracellular Distribution in a 
Variety of Cells and Tissues", J. Cell. Biol. , 91 . 
pp. 901-06 (1980); C.W. Dieffenbach et al., "Cloning of 
Murine Gelsolin and Its Regulation During 
25 Differentiation", J. Biol. Chem. , 264 , pp. 13281-88 

(1989) ]. In mammals, gelsolin occurs in two forms — a 
cytoplasmic form and a serum form. Gelsolin regulates 
the activity of actin, a major protein involved in cell 
structure and movement. Actin is a globular protein 

3 0 with a slightly elongated shape that can polymerize 
into filaments. Polymerization occurs when the 
"barbed" end of one actin monomer binds non-covalently 
and reversibly to the "pointed" end of another. Inside 
most cells, monomers and short filaments exist in a 

35 fluid-like "sol" state until the monomers are activated 
to polymerize into filaments and the filaments, in 
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turn, are activated to crosslink, producing a firmer 
"gel" phase that forms part of the cellular 
cytoskeleton. Investigators have observed that in the 
presence of calcium ion, gelsolin prevents the 
5 transition of monomers and filaments from gel phase to 
sol phase. 

Gelsolin acts on actin in three ways- First, 
it severs the noncovalent bonds between the actin 
monomers that compose actin filaments ("severing")* 

10 Second, it binds to the barbed end of actin filaments 
and prevents elongation of the filament from that end 
("capping"). Third, it binds to actin monomers and 
promotes the formation of actin filaments by providing 
a nucleus for polymerization ("nucleation") . The 

15 result is a steady state which favors short actin 

filaments unable to support the gel phase [P. A. Janmey 
et al., "Interactions of Gelsolin and Gelsolin-actin 
Complexes with Actin. Effects of Calcium on Actin 
Nucleation, Filament Severing, and End Blocking", 

20 Biochemistry . 24, pp. 3714-23 (1985)]. 

Gelsolin' s actin-severing function is 
stoichiometric: one gelsolin molecule binds to two 
monomers on the actin filament, breaks the filament, 
and remains bound to both monomers. The binding of 

25 gelsolin to one of the monomers is Ca ++ dependent, and 
chelating agents such as EGTA cause dissociation of 
gelsolin from only one monomer. 

Scientists have identified two phosphatidyl 
inositol phosphate phospholipids that bind to and 

3 0 regulate the function of gelsolin. They are 

phosphatidylinositol 4 -monophosphate (PIP) and 
phosphatidylinositol 4 , 5-biphosphate (PIP 2 ) [P. A. Janmey 
et al., "Polyphosphoinositide Micelles and 
Polyphosphoinositide-containing Vesicles Dissociate 

3 5 Endogenous Gelsolin-actin Complexes and Promote Actin 
Assembly from the Fast-growing End of Actin Filaments 
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Blocked by Gelsolin", J. Biol. Chem. . 262 , pp. 12228-36 
(1987) , P. A. Janmey and T.P. Stossel, "Modulation of 
Gelsolin Function by Phosphatidylinositol 4,5- 
biphosphate" , Nature, 325, pp. 362-64 (1987) and P. A. 
5 Janmey and T.P. Stossel, "Gelsolin-phosphoinositide 

Interaction", J. Biol. Chem. . 264, pp. 4825-31 (1989)]. 
These polyphosphoinositides are minor membrane 
phospholipids that play a role in signal transduction 
in cells [B. Alberts et al. , Molecular Biology of the 

10 Cell, Second Edition, Garland Publishing, Inc., New 
York, New York, pp. 702-703 (1989)]. Together they 
comprise less than 10% of the total phospholipids of 
cell membranes, and PIP 2 comprises less than 1%. These 
two molecules inhibit gelsolin activity by binding to 

15 gelsolin and displacing the actin monomers that are 
bound to it in a non-Ca ++ dependent manner. 

In extensively sonicated aqueous suspensions, 
both PIP and PIP 2 form vesicles. PIP 2 forms small 
vesicles, also called micelles, of about 80 nm in 

2 0 diameter, that contain about one-hundred PIP 2 

molecules. Each PIP 2 micelle binds about eight 
gelsolin molecules. PIP forms larger unilamellar (one- 
layered) vesicles. Aggregation of PIP 2 into large 
unilamellar or multimellar vesicles in the presence of 
25 millimolar concentrations of Mg ++ or nonionic detergents 
decreases the ability of PIP 2 to inhibit the actin 
filament-severing function of gelsolin. Incorporation 
of PIP 2 into mixed vesicles composed of phosphatidyl 
choline (PC) also decreases this ability, although 

3 0 about a third of maximal activity persists, even in 

vesicles containing a very high ratio of PC to PIP 2 * 
Mixed lipid vesicles whose composition approximates 
that of the cell membrane (less than 3% PIP 2 ) also 
inhibit gelsolin activity. Several other 
3 5 polyphosphoinostides which may be constructed, or have 
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already been identified in nature, would also be 
expected to bind gelsolin. 

The cDNA for human plasma gelsolin encodes a 
protein of 755 amino acids plus a 27 amino acid signal 
5 sequence [Kwiatkowski et al., "Plasma and Cytoplasmic 
Gelsolins Are Encoded by a Single Gene and Contain a 
Duplicated Actin-binding Domain", Nature , 323, pp. 455- 
58 (1986) ]. This cDNA sequence accounts for both the 
plasma and serum forms of gelsolin, which are the 

10 result of alternative transcriptional initiation sites 
and message processing from a single gene, 70 kb long 
[D. Kwiatkowski et al., "Genomic Organization and 
Biosynthesis of Secreted and Cytoplasmic Forms of 
Gelsolin", J. Cell Biol. , 106, pp. 375-84 (1988)]. The 

15 difference between the plasma and cytoplasmic forms is 
a 25 amino-acid residue extension on plasma gelsolin. 
This appears to account for the difference in relative 
molecular weight between the proteins as assessed by 
SDS-polyacrylamide gel electrophoresis (SDS-PAGE) , 

20 93 kD and 90 kD, respectively. 

Investigators have identified several 
functional domains of gelsolin [H.L. Yin et al. , 
"Identification of a Polyphosphoinositide-modulated 
Domain in Gelsolin Which Binds to the Sides of Actin 

25 Filaments", J. Cell Biol. . 106 , pp. 805-12 (1988) and 
D. Kwiatkowski et al., "Identification of Critical 
Functional and Regulatory Domains in Gelsolin", J. Cell 
Biol. , 108 . pp. 1717-26 (1989)]. The gelsolin cDNA 
contains a strong tandem repeat that divides the 

3 0 molecule into two roughly equal halves. These 

structural halves correspond to two functional halves: 
The amino-terminal half of the protein contains a Ca ++ - 
insensitive act in-severing function and the carboxy- 
terminal half has a Ca ++ -sensitive actin binding domain. 

3 5 Within these two tandem repeats are six domains of 
weaker homology. The polypeptide has three actin 
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binding sites. Two monomer binding sites are located 
between residues 26-139 and 407-756 (probably 661-738) 
and an actin filament binding site is located between 
residues 151-406. Amino acid residues 732-738 are 
5 potentially important for Ca ++ regulation. Residues 
660-738 are important for nucleation. This function 
probably requires actin binding sites on both halves of 
the molecule. The severing function resides in 
residues 1-160, possibly between residues 139-160, with 

10 critical dependence on the sequence 150-160 (the first 
eleven residues of domain two) . The PIP 2 -regulation of 
gelsolin's severing activity apparently resides within 
the first 160 residues. Sequences in domains 2 and 3 
appear to hide a cryptic Ca ++ -sensitive domain because 

15 when they are removed, the severing function of 
gelsolin becomes Ca ++ dependent. 

Significantly, the amino acid sequence of 
gelsolin exhibits homology with several other actin 
binding proteins. It is forty-five percent homologous 

20 with villin, found in vertebrate brush border 

microvilli, which also has a Ca ++ -dependent actin 
severing function. It is thirty- three percent 
homologous with sever in and fragmin [P. Matsudaira and 
P. Janmey, "Pieces in the Act in-severing Protein 

25 Puzzle", Cell , 54, pp. 139-40 (1988)]. These 
polypeptides also bind PIP and PIP 2 . 

Despite advances in biotechnology, the need 
still exists for methods and products which optimize 
the characteristics and delivery of pharmaceuticals, 

30 vaccines, diagnostics and bioactive molecules — 

including polyvalency, affinity for a single target 
particle, serum half -life, bioactivity and, in some 
cases, immunogenic ity. Furthermore, systems in which 
the component parts may be easily varied would be 
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especially useful because they would allow one to test 
for species with optimal characteristics. 

SUMMARY OF THE INVENTION 

This invention solves these problems by 
5 providing multimeric and hetero-multimeric gelsolin 
fusion constructs. A multimeric gelsolin fusion 
construct is a vesicle comprising at least one 
polyphosphoinositide, such as PIP or PIP 2 to which 
gelsolin fusion polypeptides are bound. Gelsolin 

10 fusion polypeptides comprise gelsolin moieties linked 
to functional moieties which may be pharmaceutical 
agents, vaccine agents, diagnostic agents or other 
bioactive molecules. Hetero-multimeric gelsolin fusion 
constructs comprise at least two different functional 

15 moieties or gelsolin moieties. 

Gelsolin is a particularly attractive 
candidate for attachment to lipid vesicles because it 
binds specifically and with great affinity to 
polyphosphoinositides. other proteins, related to 

2 0 gelsolin, which also specifically bind 

polyphosphoinositides may also be employed. Some 
examples are villin, fragmin, severin, profilin, 
cofilin, Cap42(a), gCap39, CapZ and destrin. 
Lipocortin (annexin) and DNasel are other molecules 
25 that bind polyphosphoinositides. Proteins that 

specifically bind other lipids may also be used, as 
well as proteins that bind lipids non-specif ically . 

The fusion constructs of this invention 
advantageously utilize the ability of 

3 0 polyphosphoinositide vesicles to bind multiple copies 

of gelsolin fusion polypeptides. Consequently, in 
contrast to monomer ic molecules, the bioactive 
molecules linked to them as functional moieties are 
characterized by one or more of the following: 
35 polyvalency, increased serum half-life, affinity for 



WO 91/17170 



PCT/US91/02954 



- 8 - 

target particles or cells, greater bioactivity or 
immunogenicity, and targetability . 

The present invention also provides gelsolin 
fusion polypeptides. Gelsolin fusion polypeptides 
5 comprise gelsolin moieties fused or chemically coupled 
to a functional moiety. In particular, this invention 
provides CD4-gelsolin fusion polypeptides. 

The lipid composition of a vesicle may also 
be varied to permit the production of vesicles varying 

10 in fluidity, size, the number of gelsolin molecules 

that will bind to it and the rate of degradation in the 
blood stream. 

Depending upon the choice of functional 
moiety, multimeric and hetero-multimeric gelsolin 

15 fusion constructs are characterized by many uses. 

Recognition molecules, such as those containing the 
antigen binding site of antibodies, viral receptors or 
cell receptors, are useful as functional moieties to 
target fusion proteins to particular antigens. When 

20 targeted in this manner, multimeric gelsolin fusion 
constructs are useful to block the binding of viruses 
to cells that results in infection, or the binding of 
cells to other cells that, for example, characterizes 
pathologic inflammation. Due to the multivalency of 

25 the fusion constructs of this invention, we believe 

that they possess greater affinity for the target than 
monovalent molecules. In one embodiment of this 
invention, the functional moiety is the receptor on 
human lymphocytes, CD4 , which is the target of the HIV 

3 0 virus — the causative agent of AIDS and ARC. 

When hetero-multimeric fusion constructs 
comprise gelsolin fusion polypeptides having 
combinations of recognition molecules and toxins, 
anti-retroviral agents or radionuclides, they are 

3 5 useful as therapeutic agents which search out and 
destroy their target. 
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Multimeric gelsolin fusion constructs with 
recognition molecules are also useful for signal 
enhancement in diagnostic assays. As large, multimeric 
molecules, they present many binding sites for reporter 
5 molecules, such as horseradish peroxidase-conjugated 
antibodies. Alternatively, they may take the form of 
hetero-multimeric constructs, possessing both 
recognition molecules and multiple reporter groups. 

When the functional moiety is one or more 
10 imraunogen from one or more infectious agent, the fusion 
proteins of this invention are useful in vaccines. 

Also, multimeric gelsolin fusion constructs 
may be employed as agents with increased bioactivity 
when the functional group is an enzyme, substrate, or 
15 inhibitor. 

This invention also provides multimeric 
gelsolin fusion constructs that are liposomes whose 
constituents include polyphosphoinositides and that 
contain bioactive agents in their interiors. 

20 This invention further provides DNA sequences 

that encode gelsolin fusion polypeptides, recombinant 
DNA molecules comprising them and unicellular host 
cells transformed with them. And this invention 
provides methods for producing these fusion 

25 polypeptides by culturing such hosts. 

This invention also provides compositions 
comprising any of the above-identified fusion 
polypeptides or proteins that are useful as 
therapeutic, prophylactic or diagnostic agents. 

30 Multimeric CD4-gelsolin fusion constructs may be used 
in diagnosing, preventing and treating AIDS, ARC or HIV 
infection. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

Figures 1A-1F ("Figure 1") (SEQ ID NO:l) 
depict the DNA sequence and deduced amino acid sequence 
of human gelsolin as set forth in D.J. Kwiatkowski 
5 et al., Nature, 323, pp. 455-58 (1986). The negatively 
numbered amino acids correspond to the signal sequence, 
which is absent from the mature polypeptide. 
Throughout this specification, references to human 
gelsolin by amino acid sequence or DNA sequence 
10 correspond to the coordinate system set forth in this 
f igur e . 

Figure 2 depicts the functional regions of 
human gelsolin amino acid sequence. 

Figures 3A-3D ("Figure 3") (SEQ ID NO: 2) 

15 depict the DNA sequence and deduced amino acid sequence 
of human CD4 DNA. Nucleotides 1-75 are derived from 
plasmid pl70.2. Nucleotides 76-741 are derived from 
plasmid pCD4 -gelsolin. Nucleotides 742 to 1377 are 
derived from pl70.2. Throughout this specification, 

20 references to CD4 by amino acid or DNA sequence 

correspond to the coordinate system of this figure, 
unless otherwise specified. 

Figure 4 depicts the domain structure of 
human CD4 protein. The numbered amino acids are 

25 cysteine residues involved in disulfide bonding 
according to Figure 3. 

Figure 5 depicts the DNA sequences of the 
oligomers used in the processes set forth in the 
examples of this application. The gelsolin sequences 

3 0 in this figure are derived from SEQ ID NO:l. ACE 144 

is SEQ ID NO: 3. ACE 145 is SEQ ID NO: 4. T4 AID-133 is 
SEQ ID NO: 5. T4AID-134 is SEQ ID NO: 6. T4AID-137 is 
SEQ ID NO: 7. T4AID-176 is SEQ ID NO: 8. T4AID-176 is 
SEQ ID NO: 9. 

35 Figure 6 depicts the construction of plasmid 

pCD4-gelsolin. 
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Figures 7A-7B ("Figure 7") (SEQ ID NO: 10) 
depicts the DNA sequence and deduced amino acid 
sequence of pCD4-gelsolin. 

Figure 8 is a restriction map of 
5 pCD4-gelsolin. 

Figure 9 depicts the construction of plasmid 

pDC219. 

Figures 10A-10F ("Figure 10") (SEQ ID NO: 11) 
depict the DNA sequence of p218-8. 
10 Figure 11 depicts the construction of plasmid 

pAP L l80cys. 

Figures 12A-12I ("Figure 12") (SEQ ID NO: 12) 
depict the DNA sequence of pBG3 91. 

Figures 13A-13H ("Figure 13") (SEQ ID NO: 13) 
15 depict the DNA sequence of pEX4 6. 

DETAILED DESCRIPTION OF THE INVENTION 

"Human plasma gelsolin" refers to a 
polypeptide having the amino acid sequence depicted in 
Figure 1 (SEQ ID NO:l) from amino acids -27 to +755. 

20 It should be understood that polypeptide expression 

often involves post-translational modifications such as 
cleavage of the signal sequence, intramolecular 
disulfide bonding, glycosylation and the like. The use 
of the term, human plasma gelsolin, contemplates such 

25 modifications to the amino acid sequence of Figure 1 
(SEQ ID N0:1). The term also includes gelsolin 
obtained from natural, recombinant or synthetic 
sources . 

"Multimeric gelsolin fusion constructs" and 
30 "hetero-multimeric gelsolin fusion constructs" each 
comprise gelsolin fusion polypeptides bound to a 
vesicle of aggregated phospholipids. A "gelsolin 
fusion polypeptide" comprises a gelsolin moiety bound 
to a functional moiety. "Functional moieties" may be 
35 polypeptides ("polypeptide moieties") or chemical 



compounds ("chemical moieties") . Throughout this 
application, specific gelsolin fusion polypeptides are 
referred to by the name of the functional moiety. For 
example, we call a gelsolin fusion polypeptide having 
CD4 as the functional moiety, CD4-gelsolin fusion 
polypeptide. Hetero-multimeric gelsolin fusion 
constructs comprise at least two different functional 
moieties or gelsolin moieties. 

When the functional moiety is a polypeptide, 
gelsolin fusion polypeptides may be produced by 
chemical crosslinking or genetic fusion. Genetic 
fusion involves creating a hybrid DNA sequence in which 
the DNA sequence encoding the polypeptide is fused to 
the 5 1 end or 3 1 end of a DNA sequence encoding the 
gelsolin moiety. Upon expression in an appropriate 
host, this hybrid DNA sequence produces a gelsolin 
fusion polypeptide in which the polypeptide moiety is 
fused to the N-terminus or C-terminus of the gelsolin 
moiety. 

A "gelsolin moiety" as used herein is 
gelsolin or a fragment thereof that specifically binds 
to a polyphosphoinositide. Preferably, the gelsolin 
moiety will be derived from human plasma gelsolin. A 
gelsolin moiety preferably includes amino acids +150 to 
+160 of Figure 1 (SEQ ID NO:l). As demonstrated 
herein, the polypeptide containing amino acids +150 to 
+169 of Figure 1 (SEQ ID NO:l) has the ability to bind 
PIP 2 . We believe that gelsolin derived from non-human 
vertebrates may also be useful according to this 
invention. The structure of gelsolin is highly 
conserved in evolution and gelsolin from non-human 
mammals may not be immunogenic in humans. 

Lipid binding proteins ("LBPs") other than 
gelsolin are also known to exist. These proteins, or 
fragments of them that bind to particular lipids, are 
useful as LBP moieties (similarly to gelsolin moieties) 



WO 91/17170 



PCT/US91/02954 



- 13 - 

to produce LBP fusion polypeptides that bind to 
vesicles containing the particular lipid. This creates 
multimeric or hetero-multimeric LBP fusion constructs, 
Gelsolin-related proteins that specifically bind 
5 polyphosphoinositides include villin, sever in, fragmin, 
profilin, cofilin, Cap42 (a) , gCap39, CapZ and destrin 
[E. Andre et al. , "Severin, Gelsolin, and Villin Share 
a Homologous Sequence in Regions Presumed to Contain 
F-actin Severing Domains", J. Biol. Chem. , 263, 

10 pp. 722-27 (1988); W.L. Bazari et al., "Villin Sequence 
and Peptide Map Identify Six Homologous Domains", Proc. 
Natl. Acad. Sci. , USA , 85 , pp. 4986-90 (1988); C. Ampe 
et al., "The Primary Structure of Human Platelet 
Profilin: Reinvestigation of the Calf Spleen Sequence", 

15 FEBS Letters . 228 , pp. 17-21 (1988); D.J. Kwiatkowski 
and G.A.P. Bruns, "Human Profilin", J. Biol. Chem. , 
263 , pp. 5910-15 (1988); I. Lassing and U. Lindberg, 
"Specificity of the Interaction Between 
Phosphatidylinositol 4 , 5-biphosphate and the Profilin: 

20 Actin Complex", J. Cell. Biochem. , 37, pp. 2 55-67 

(1988); C. Ampe and J. Vandeker ckhove , "The F-actin 
Capping Proteins of Physarum polvceohalum " , EMBQ. J. , 
6, pp. 4149-57 (1987); I. Lassing and U. Lindberg, 
"Specific Interaction between Phosphatidylinositol 4,5- 

25 biphosphate and Prof ilactin" , Nature , 314 , pp. 472-74 
(1985), F.-X. Yu et al. , "gCap39, a Calcium Ion- and 
Polyphosphoinositide-regulated Actin Capping Protein", 
Science , 250 , pp. 1413-15 (1990); and N. Yonezawa 
et al., "Inhibition of the Interactions of Cofilin, 

30 Destrin and Deoxyribonuclease I with Actin by 

Phosphoinositides", J. Biol. Chem. , 265 , pp. 8382-86 
(1990)]. Other LBPs that specifically bind 
polyphosphoinositides are lipocortin [K. Machoczek 
et al., "Lipocortin I and Lipocortin II Inhibit 

35 Phosphoinositide and Polyphosphoinositide-specif ic 

Phospholipase C" FEBS Letters , 251 , pp. 207-12 (1989)] 
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and DNase I [J. A. Cooper et al., "The Role of Actin 
Polymerization in Cell Motility", Ann. Rev, Phvs. , 53 , 
pp. 585-605 (1991) ] . Protein kinase C is also an LBP 
which binds to some phospholipids. 

.5 DNA sequences encoding gelsolin moieties are 

derived from DNA sequences encoding gelsolin. Several 
methods are available to obtain these DNA sequences. 
First, one can chemically synthesize the gelsolin gene 
or a degenerate version of it using a commercially 

10 available chemical synthesizer. Figure 1 (SEQ ID NO:l) 
sets forth a DNA sequence for gelsolin. The coding 
region encompasses nucleotides +1 to +2360. 

Second, one can isolate a cDNA sequence 
encoding gelsolin by screening a cDNA library. Many 

15 screening methods are known to the art. For example, 
colonies may be screened by nucleic acid hybridization 
with oligonucleotide probes. Probes may be prepared by 
chemically synthesizing an oligonucleotide having part 
of the known DNA sequence of gelsolin. Alternatively, 

2 0 cDNA libraries may be constructed in expression 

vectors, such as Agtll, and the colonies screened with 
anti-gelsolin antibodies. 

Third, one can isolate a cDNA encoding 
gelsolin or a gelsolin moiety by amplifying DNA with 
25 polymerase chain reaction (PCR) . We describe this 
process in Example I. 

The DNA sequence encoding the gelsolin moiety 
may then be fused to a DNA sequence encoding the 
polypeptide moiety. DNA sequences for the polypeptide 

3 0 moieties useful in this invention are available from 

many sources. These include DNA sequences described in 
the literature and DNA sequences for particular 
polypeptides obtained by any of the conventional 
molecular cloning techniques. 
35 A wide array of polypeptides are useful to 

produce the gelsolin fusion polypeptides of this 
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invention. Those most useful include polypeptides that 
are advantageously administered in multimeric form. 
For example, viral receptors, cell receptors or cell 
ligands are useful because they typically bind to 
5 particles or cells exhibiting many copies of the 

receptor. Fusion constructs containing these fusion 
polypeptides are useful in therapies that involve the 
inhibition of viral-cell or cell-cell binding. Useful 
viral-cell receptors include ICAM1, a rhinovirus 

10 receptor; the polio virus receptor [J. M. White and 
D.R. Littman, "Viral Receptors of the Immunoglobulin 
Superf amily" , Cell , 56 , pp. 725-28 (1989)] and, most 
preferably, CD4, the HIV receptor. Cell-cell receptors 
or ligands include members of the vascular cell 

15 adhesion molecule family, such as ICAM1, ELAM1, VCAM1 
and VCAMlb and their lymphocyte counterparts (ligands) 
LFA1, CDX and VLA4 . These molecules are involved in 
pathologic inflammation [M.P. Bevilacqua et al., 
"Identification of an Inducible Endothelial-Leukocyte 

20 Adhesion Molecule", Proc. Natl. Acad. Sci. , USA, 84/ 
pp. 9238-42 (1987); L. Osborn et al., "Direct 
Expression Cloning of Vascular Cell Adhesion 
Molecule 1: A Cytokine-induced Endothelial Protein 
that Binds to Lymphocytes", Cell , 59, pp. 1203-11 

25 (1989); C.A. Hession et al., "Endothelial Cell- 
leukocyte Adhesion Molecules (ELAMs) and Molecules 
Involved in Leukocyte Adhesion (MILAs)", WO 90/13300]. 
Other lymphocyte associated antigens, such as LFA2 
(CD2) and LFA3 (both members of the CD11/CD18 family) 

30 and PAGEM are also useful. 

Bacterial immunogens, parasitic immunogens 
and viral immunogens may be used as polypeptide 
moieties to produce multimeric or hetero-multimeric 
gelsolin fusion constructs useful as vaccines. 

35 Bacterial sources of these immunogens include those 
responsible for bacterial pneumonia and Pneumocystis 
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parasites, such as Plasmodium . Viral sources include 
poxviruses, e.g., cowpox virus and orf virus; herpes 
viruses, e.g., herpes simplex virus type 1 and 2, 
5 B-virus, varicella-zoster virus, cytomegalovirus, and 
Epstein-Barr virus ; adenoviruses , e.g., mastadenovirus ; 
papovaviruses , e.g., papillomaviruses, and 
polyoma viruses such as BK and JC virus; parvoviruses, 
e.g., adeno-associated virus; reoviruses, e.g., 

10 reoviruses 1, 2 and 3; orbiviruses, e.g., Colorado tick 
fever ; rotaviruses , e.g., human rotaviruses ; 
alphaviruses, e.g., Eastern encephalitis virus and 
Venezuelan encephalitis virus; rubiviruses, e.g., 
rubella; flavi viruses, e.g., yellow fever virus, Dengue 

15 fever viruses, Japanese encephalitis virus, Tick-borne 
encephalitis virus and hepatitis C virus; 
coronaviruses , e.g., human coronaviruses ; 
paramyxoviruses, e.g., parainfluenza 1, 2, 3 and 4 and 
mumps; morbilliviruses, e.g., measles virus; 

20 pneumovirus, e.g., respiratory syncytial virus; 

vesiculoviruses, e.g., vesicular stomatitis virus; 
lyssaviruses, e.g., rabies virus; orthomyxoviruses, 
e.g., influenza A and B; bunyaviruses e.g., LaCrosse 
virus; phleborviruses , e.g., Rift Valley fever virus; 

25 nairoviruses, e.g., Congo hemorrhagic fever virus; 

hepadnaviridae , e.g., hepatitis B; arenaviruses, e.g., 
lcm virus, Lassa virus and Junin virus; retroviruses, 
e.g., HTLV I, HTLV II, HIV I and HIV II; enteroviruses, 
e.g., polio virus 1, 2 and 3, coxsackie viruses, 

3 0 echoviruses, human enteroviruses, hepatitis A virus, 
hepatitis E virus, and Norwalk virus; rhinoviruses 
e.g., human rhinovirus; and filoviridae, e.g., Marburg 
(disease) virus and Ebola virus. 

Immunoglobulins or fragment thereof that bind 

35 to a target molecule may also be employed as functional 
moieties. Immunoglobulin molecules are bivalent, but 
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multimeric immunoglobulin-gelsolin fusion constructs, 
which are multivalent, may demonstrate increased 
affinity or avidity for the target. Investigators have 
also made use of single domain antibodies (dAbs) [E.S. 
5 Ward et al., "Binding Activities of a Repertoire of 
Single Immunoglobulin Variable Domains Secreted from 
Escherichia 0011 ", Nature . 341 , pp. 544-46 (1989)]. 
One can generate monoclonal Fab fragments recognizing 
specific antigens using the technique of Huse et al. 

10 and use individual domains as functional moieties in 
multimeric or hetero-multimeric gelsolin fusion 
constructs according to this invention [W.D. Huse 
et al . , "Generation of a Large Combinatorial Library of 
the Immunoglobulin Repertoire in Phage Lambda", 

15 Science, 246 . pp. 1275-81 (1989)]. See also A. Skerra 
and A. Pluckthun "Assembly of a Functional 
Immunoglobulin Fv Fragment in Escherichia coli", 
Science , 240 . pp. 1038-43 (1988)]. 

According to this invention, multimeric 

20 gelsolin fusion constructs may be produced in which the 
functional moiety is an enzyme, enzyme substrate or 
enzyme inhibitor. We believe that such agents will 
exhibit greater bioactivity than monomeric molecules 
because multimers have a higher density of the moiety 

25 and will exhibit increased catalytic rate. For 

example, we believe that a multimeric gelsolin fusion 
construct with tissue plasminogen activator would have 
greater clot-dissolving catalytic activity than its 
monovalent counterpart. Similarly, we believe that a 

3 0 multimeric gelsolin fusion construct with hirudin would 
demonstrate greater anti-coagulant activity than 
hirudin alone. 

Other useful functional moieties include, but 
are not limited to, polypeptides such as cytokines, 

35 including the various lFN-a»s, particularly a2 , ct5 , aB , 
IFN-B and IFN—y, the various inter leukins, including 
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IL-1, IL-2, IL-3, IL-4, IL-5 , IL-6 , IL-7 and IL-8 and 
the tumor necrosis factors, TNF-a, and B. In addition, 
functional moieties include monocyte colony stimulating 
factor (M-CSF) , granulocyte colony stimulating factor 
5 (G-CSF) , granulocyte macrophage colony stimulating 
factor (GM-CSF) , erythropoietin, platelet-derived 
growth factor (PDGF) , and human and animal hormones, 
including growth hormones and insulin. 

According to one embodiment of this 

10 invention, multimeric gelsolin fusion constructs 

comprise CD4 -gelsolin fusion polypeptides. CD4 is the 
receptor on those white blood cells, T-lymphocytes, 
which recognizes HIV, the causative agent of AIDS and 
ARC [P.J, Maddon et al., "The T4 Gene Encodes the AIDS 

15 Virus Receptor and Is Expressed in the Immune System 
and the Brain", Cell , 47, pp. 333-48 (1986)]. 
Specifically, CD4 recognizes the HIV viral surface 
protein, gpl20 and gpl60. In CD4 -gelsolin fusion 
polypeptides the functional moiety is a polypeptide 

20 moiety comprising full length CD4 or a fragment 

thereof, preferably soluble CD4. Use of the term, CD4, 
in this specification may refer to full length CD4 or 
fragments of CD4, unless specified. 

A DNA sequence encoding full length human CD4 

25 polypeptide and its deduced amino acid sequence is set 
forth in Figure 3 (SEQ ID NO: 2). (See also P.J. Maddon 
et al., "The Isolation and Nucleotide Sequence of a 
cDNA Encoding the T Cell Surface Protein T4: A New 
Member of the Immunoglobulin Gene Family", Cell , 42., 

30 pp. 93-104 (1985).) Based upon its deduced primary 
structure, the CD4 polypeptide is divided into 
functional domains as follows: 
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Amino Acid 
Coordinates 

Structure /Proposed Location In Figure 3 

Hydrophobic/ Secretory Signal -25 to -1 

5 First Immunoglobulin-related +1 to +107 

domain/ Extracellular 

Second Immunoglobulin-related +108 to +177 

doma in / Ext race 1 lul ar 

Third Immunoglobulin-related +178 to +293 

1 0 doma in / Ext r a ce 1 lul ar 

Fourth Immunog lobu 1 in-r e lated 
domain / Extracellular 

Hydrophobic/ Transmembrane +371 to +391 

Sequence 

15 Very Hydrophilic/ +392 to +431 

Intracytoplasmic 

The first immunoglobulin-related domain can be further 
resolved into a variable-related (V) region and joint- 
related (J) region, beginning at about amino acid +95 
20 [S.J. Clark et al. , "Peptide and Nucleotide Sequences 
of Rat CD4 (W3/25) Antigen: Evidence for Derivation 
from a Structure with Four Immunoglobulin-related 
Domains", Proc. Natl. Acad. Sci. , USA, 84, pp. 1649-53 
(1987) ] . 

25 These domains also correspond roughly to 

structural domains of the CD4 protein due to intra- 
domain disulfide bonding. Thus, disulfide bonds join 
amino acids at positions +16 and +84 in the first 
immunoglobulin-related domain, amino acids +130 and 

3 0 +159 of the second immunoglobulin-related domain, and 

amino acids +303 and +345 of the fourth immunoglobulin- 
related domain. Figure 4 depicts the domain structure 
of the full length human CD4 protein. 

Soluble CD4 proteins have been constructed by 

35 truncating the full length CD4 protein at amino acid 
+375, to eliminate the transmembrane and cytoplasmic 



WO 91/17170 



PCT/US91/02954 



- 20 - 

domains. Such proteins have been produced by 
recombinant DNA techniques and are referred to as 
recombinant soluble CD4 (rsCD4) [R.A. Fisher et al. , 
"HIV Infection Is Blocked In Vitro by Recombinant 
5 Soluble CD4", Nature, 331 . pp. 76-78 (1988); Fisher 
et al. , PCT patent application WO 89/01940 
(incorporated herein by reference)]. These soluble CD 4 
proteins advantageously interfere with the CD4 + 
lymphocyte /HIV interaction by blocking or competitive 

10 binding mechanisms which inhibit HIV infection of cells 
expressing the CD4 protein. The first immunoglobul in- 
related domain is sufficient to bind gpl20 and gpl60. 
By acting as soluble virus receptors, soluble CD 4 
proteins are useful as antiviral therapeutics to 

15 inhibit HIV binding to CD4 + lymphocytes and virally 
induced syncytia formation. 

The CD 4 polypeptides useful in this invention 
include all CD4 polypeptides which bind to or otherwise 
inhibit gpl20 and gpl60. These include fragments of 

2 0 CD4 lacking the transmembrane domain, amino acids +371 
to +391 of Figure 3 (SEQ ID NO:2) . Such fragments may 
be truncated forms of CD4 or be fusion proteins in 
which the fourth immunoglobulin-related domain is 
joined directly to the hydrophilic cytoplasmic domain. 

25 We shall refer herein to a CD4 polypeptide which 
includes amino acids +1 to +X of Figure 3 (SEQ ID 
NO: 2), and optionally including an N-terminal 
methionine or f -methionine f as ,I CD4(X)" . When a CD 4 
polypeptide is engineered to include a carboxy-terminal 

30 cysteine, we shall refer to the polypeptide as 
"CD4 (XCys) 

For example, referring now to Figure 3 (SEQ 
ID NO: 2), a soluble CD 4 protein containing the first 
immunoglobulin-like domain preferably will contain at 
35 least amino acids +1 to +84 and at most amino acids +1 
to +129. Most preferably, a soluble CD4 protein 
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comprises amino acids +1 to +111 [CD4 (111) ] . A soluble 
CD 4 protein containing the first two immunoglobulin- 
like domains preferably will include at least amino 
acids +1 to +159 and at most amino acids +1 to +302. 
5 More preferably, a soluble CD 4 protein will include at 
least amino acids +1 to +175 and at most amino acids +1 
to +190. Most preferably, a soluble CD4 protein will 
include amino acids +1 to +181 [CD4(181)] / amino acids 
+1 to +183 [CD4(183}], or amino acids +1 to +187 

10 [CD4(187)]. A soluble CD4 protein which includes the 
first four immunoglobulin-like domains preferably will 
include at least amino acids +1 to +345 [CD4(345)] and 
at most amino acids +1 to +375 [CD4 (375) ] . Any of 
these molecules may optionally include the CD4 signal 

15 sequence, amino acids -23 to -1 of Figure 3 (SEQ ID 
NO:2). Also, these molecules may have a modified 
methionine residue preceding amino acid, +1. 

Soluble CD4 proteins useful in the fusion 
polypeptides and methods of this invention may be 

20 produced in a variety of ways. According to the 

coordinate system in Figure 3 (SEQ ID NO: 2), the amino 
terminal amino acid of mature CD4 protein isolated from 
T cells is lysine, encoded at nucleotides 136 to 139 of 
Figure 3 (SEQ ID NO: 2) . [D.R. Littman et al., 

25 "Corrected CD4 Sequence", Cell , 55, p. 541 (1988).] 

Soluble CD4 proteins also include those in which amino 
acid +1 is asparagine, +62 is arginine and +229 is 
phenylalanine. Therefore, when we refer to CD4, we 
intend to include amino acid sequences including any or 

30 all of these substitutions. Soluble CD4 polypeptides 
may be produced by conventional recombinant techniques 
involving oligonucleotide-directed mutagenesis and 
restriction digestion, followed by insertion of 
linkers, or by digesting full-length CD4 protein with 

35 enzymes. 
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Soluble CD4 proteins include those produced 
by recombinant techniques according to the processes 
set forth in copending , commonly assigned United States 
patent applications Serial No. 094,322 , filed 
5 September 4, 1987 and Serial No. 141,649, filed 
January 7, 1988, and PCT patent application Serial 
No. PCT/US88/02940, filed September 1, 1988, and 
published as PCT patent application WO 89/01940, the 
disclosures of which are hereby incorporated by 
1 0 reference . 

Microorganisms and recombinant DNA molecules 
characterized by DNA sequences coding for soluble CD4 
proteins are exemplified by cultures described in PCT 
patent application WO 89/01940. They were deposited in 
15 the In Vitro International, Inc. culture collection, in 
Linthicum, Maryland, USA on September 2, 1987 and 
identified as: 

EC100: E.coli JM83/pEC100 - IVI 10146 
BG377: E.coli MC1061/pBG377 - IVI 10147 
20 BG380: E.coli MC1061/pBG380 - IVI 10148 

BG381: E.coli MC1061/pBG381 - IVI 10149. 
Such microorganisms and recombinant DNA molecules are 
also exemplified by cultures deposited in the In Vitro 
International, Inc. culture collection on January 6, 
25 1988 and identified as: 

BG-391: E.coli MC1061/pBG391 - IVI 10151 
BG-392: E.coli MC1061/pBG392 - IVI 10152 
BG-393: E.coli MC1061/pBG393 - IVI 10153 
BG-394: E.coli MC1061/pBG394 - IVI 10154 
30 BG-396: E.coli MC1061/pBG396 - IVI 10155 

203-5 : E.coli SG936/p203-5 - IVI 10156. 
Additionally, such microorganisms and 
recombinant DNA molecules are exemplified by cultures 
deposited in the In Vitro International, Inc. culture 
35 collection on August 24, 1988 and identified as: 
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211-11: E.coli A89/pBG211-ll - IVI 10183 

214- 10: E.coli A89/pBG214-10 - IVI 10184 

215- 7 : E.coli A89/pBG215-7 - IVI 10185. 

Multimeric CD4-gelsolin fusion constructs 
5 comprising CD4-gelsolin fusion polypeptides may be used 
in pharmaceutical compositions and methods to treat 
humans having AIDS, ARC, HIV infection, or antibodies 
to HIV. Accordingly, they may be used to lessen the 
immuno-compromising effects of HIV infection or to 

10 prevent the incidence and spread of HIV infection. In 
addition, these fusion proteins and methods may be used 
for treating AIDS-like diseases caused by retroviruses, 
such as simian immunodeficiency viruses, in mammals, 
including humans. 

15 DNA sequences encoding gelsolin fusion 

polypeptides are useful for producing multimeric 
gelsolin fusion constructs. The preferred process for 
using these DNA sequences involves expressing the 
gelsolin fusion polypeptide in an appropriate host, 

20 isolating the polypeptide, and binding it to a vesicle 
comprising a polyphosphoinositide. 

As is well known in the art, for expression 
of the DNA sequences of this invention, the DNA 
sequence should be operatively linked to an expression 

25 control sequence in an appropriate expression vector 
and employed in that expression vector to transform an 
appropriate unicellular host. Such operative linking 
of a DNA sequence of this invention to an expression 
control sequence, of course, includes the provision of 

30 a trans lational start signal in the correct reading 
frame upstream of the DNA sequence. If a particular 
DNA sequence being expressed does not begin with an 
ATG, the start signal will result in an additional 
amino acid — methionine (or f -methionine in 

35 bacteria) — being located at the N-terminus of the 
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product. While such methionyl-containing product may 
be employed directly in the compositions and methods of 
this invention, it is usually more desirable to remove 
the methionine before use. Methods are known to those 
5 of skill in the art to remove such N-terminal 

methionines from polypeptides expressed with them. For 
example, certain hosts and fermentation conditions 
permit removal of substantially all of the N-terminal 
methionine in vivo . Expression in other hosts requires 

10 in vitro removal of the N-terminal methionine. 

However, such in vivo and in vitro methods are well 
known in the art. 

A wide variety of host/expression vector 
combinations may be employed in expressing the DNA 

15 sequences of this invention. Useful expression 
vectors, for example, may consist of segments of 
chromosomal, non-chromosomal and synthetic DNA 
sequences, such as various known derivatives of SV40 
and known bacterial plasmids, e.g., plasmids from 

20 E.coli including colEl, pCRl, pBR322, pMB9 and their 
derivatives, wider host range plasmids, e.g., RP4, 
phage DNAs, e.g., the numerous derivatives of phage A, 
e.g., NM989, and other DNA phages, e.g., Ml 3 and 
filamentous single stranded DNA phages, yeast plasmids, 

25 such as the 2/x plasmid or derivatives thereof, and 

vectors derived from combinations of plasmids and phage 
DNAs, such as plasmids which have been modified to 
employ phage DNA or other expression control sequences. 
In addition, any of a wide variety of 

3 0 expression control sequences — sequences that control 
the expression of a DNA sequence when operatively 
linked to it — may be used in these vectors to express 
the DNA sequences of this invention. Such useful 
expression control sequences, include, for example, the 

35 early and late promoters of SV4 0 or adenovirus, the lac 
system, the trp system, the TAC or TRC system, the 
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major operator and promoter regions of phage A, the 
control regions of fd coat protein, the promoter for 3- 
phosphoglycerate kinase or other glycolytic enzymes, 
the promoters of acid phosphatase, e.g., Pho5, the 
5 promoters of the yeast a-mating factors, the polyhedron 
promoter of the baculovirus system and other sequences 
known to control the expression of genes of prokaryotic 
or eukaryotic cells or their viruses, and various 
combinations thereof. 

10 A wide variety of unicellular host cells are 

also useful in expressing the DNA sequences of this 
invention. These hosts include well known eukaryotic 
and prokaryotic hosts, such as strains of E.coli , 
Pseudomonas . Bacillus . Streptomvces , fungi, such as 

15 yeasts, and animal cells, such as CHO and mouse cells, 
African green monkey cells, such as COS-1, COS-7, 
BSC 1, BSC 40, and BMT 10, insect cells, and human 
cells and plant cells in tissue culture. For animal 
cell expression, we prefer CHO cells and COS-7 cells. 

20 It should of course be understood that not 

all vectors and expression control sequences will 
function equally well to express the DNA sequences of 
this invention. Neither will all hosts function 
equally well with the same expression system. However, 

25 one of skill in the art may make a selection among 

these vectors, expression control sequences, and hosts 
without undue experimentation and without departing 
from the scope of this invention. For example, in 
selecting a vector, the host must be considered because 

3 0 the vector must replicate in it. The vector 1 s copy 
number, the ability to control that copy number, and 
the expression of any other proteins encoded by the 
vector, such as antibiotic markers, should also be 
considered. 

35 In selecting an expression control sequence, 

a variety of factors should also be considered. These 
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include, for example, the relative strength of the 
system, its controllability, and its compatibility with 
the particular DNA sequence of this invention, 
particularly as regards potential secondary structures. 
5 Unicellular hosts should be selected by consideration 
of their compatibility with the chosen vector, the 
toxicity of the product coded for on expression by the 
DNA sequences of this invention to them, their 
secretion characteristics, their ability to fold 

10 proteins correctly, their fermentation requirements, 
and the ease of purification of the products coded on 
expression by the DNA sequences of this invention. 

Within these parameters, one of skill in the 
art may select various vector/ expression control 

15 system/host combinations that will express the DNA 

sequences of this invention on fermentation or in large 
scale animal culture, e.g., CHO cells or COS-7 cells. 

According to one embodiment of this 
invention, a plasmid comprising a DNA sequence encoding 

20 a CD4-gelsolin fusion polypeptide operatively linked to 
a AP T promoter expression control sequence is expressed 
in E.coli to produce a CD4-gelsolin fusion polypeptide. 

The polypeptides and proteins produced on 
expression of the DNA sequences of this invention may 

25 be isolated from fermentation or animal cell cultures 
and purified using any of a variety of conventional 
methods. One of skill in the art may select the most 
appropriate isolation and purification techniques 
without departing from the scope of this invention. 

30 One can also produce gelsolin fusion 

polypeptides by chemical synthesis using conventional 
peptide synthesis techniques, such as solid phase 
synthesis [R.B. Merrifield, "Solid Phase Peptide 
Synthesis. I. The Synthesis of a Tetrapeptide" , J. Am. 

35 Chem. Soc. . 83 , pp. 2149-54 (1963)]. 
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Another method useful for producing gelsolin 
fusion polypeptides, in addition to genetic fusion and 
chemical synthesis, is by chemically coupling the 
functional moiety to the gelsolin moiety. This method 
. 5 is useful for both chemical moieties and polypeptide 
moieties. 

Several approaches are available for 
chemically coupling the gelsolin moiety to a 
polypeptide moiety. The preferable strategy is to 

10 identify or create sites on the polypeptide moiety 
through which it may be selectively linked to the 
gelsolin moiety without destroying the activity of the 
polypeptide moiety. Glycoproteins, such as CD4 , have 
limited numbers of sugars that are useful as cross- 

15 linking sites. The sugars may be oxidized to aldehydes 
and an aldehyde then reacted with an amine group on the 
gelsolin moiety to create an aldehyde-amine linkage. 
[P.K. Nakane and A. Kawaoi, "Peroxidase Labelled 
Antibody: A New Method of Conjugation", J. Histochem. 

20 Cvtochem. . 22, p. 1084 (1984) and T.-H. Liao et al. , 

"Modification of Sialyl Residues of sialolycoprotein(s) 
of the Human Erythrocyte Surface", J. Biol. Chem. , 248, 
pp. 8247-53 (1973)]. CD 4 has two functional 
glycosylation sites at amino acids +269 to +271 and 

25 +298 to +300 (see SEQ ID NO: 3). These are outside the 
gpl20 binding region, which is within the first 113 
amino acids of the protein [B.H. Chao et al., "A 113- 
amino Acid Fragment of CD4 Produced in Escherichia coli 
Blocks Human Immunodeficiency Virus-induced Cell 

30 Fusion", J. Biol. Chem. , 264, pp. 5812-17 (1989)]. 

Therefore, coupling CD4 through the carbohydrate should 
not interfere with function. Alternatively, CD4 may be 
genetically engineered to eliminate one of the 
glycosylation sites. This would increase selectivity 

35 during linkage. We describe aldehyde-amine linkages in 
Example II using CD4 . 
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Protein chemists have also developed specific 
chemistries for covalently coupling polypeptides 
through thiol groups. A polypeptide moiety having a 
free thiol may be linked to a gelsolin moiety 
5 containing a cysteine either by direct formation of a 
disulfide bond or indirectly through a homo- 
bifunctional crosslinker. One example of a homo- 
bifunctional crosslinker is bismaleimidohexane (BMH) 
which has thiol-reactive maleimide groups and forms 

10 covalent bonds with free thiols. These methods require 
the construction of a gelsolin moiety with a cysteine 
at the amino- or carboxy-terminus . Peptide 
synthesizers (Example II, Section 2) are useful for in 
these constructions. 

15 If the polypeptide moiety does not have a 

free thiol group, such a group may be introduced. For 
example, the polypeptide may be bound to a thiol- 
containing amine. More particularly, an oxidized sugar 
on the polypeptide moiety may be reacted with the amine 

20 as described above. 

Also, a cysteine may be introduced into the 
amino acid sequence of the polypeptide moiety by site- 
directed mutagenesis. 

Alternatively, the polypeptide moiety and the 

25 gelsolin moiety may be crosslinked through hetero- 

bifunctional crosslinking agents. These are chosen so 
that one of the functional groups binds to a group on 
the polypeptide moiety and the other binds to the thiol 
on the gelsolin moiety. For example, a succinimide 

3 0 group could bind to an amine group on the polypeptide 
moiety and a thiol-reactive group, such as a maleimide 
or an activated thiol could bind to a cysteine on the 
gelsolin moiety. 

We describe methods involving thiol linkage 

35 in Example III using CD4 . The Pierce Co. 

Immunotechnology Catalogue and Handbook Volume 1 
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§§ E4-E12, E41-E48 and E31-E40 describes many useful 
homo- and hetero-bif unctional crosslinkers, thiol- 
containing amines and molecules with reactive groups. 
Other methods useful for coupling both 
5 polypeptide and chemical moieties include, for example, 
those employing glutaraldehyde [M. Reichlin, "Use of 
Glutaraldehyde as a Coupling Agent for Proteins and 
Peptides", Methods In Enzvmolocrv , 70, pp. 159-65 
(1980) ] , N-ethyl-N'-(3-dimethylaminopropyl)- 

10 carbodiimide [T.L. Goodfriend et al., "Antibodies to 
Bradykinin and Angiotensin: A Use of Carbodiimides in 
Immunology", Science . 144, pp. 1344-46 (1964)] or a 
mixture of N-ethyl-N 1 - ( 3 -dimethy laminopropy 1 ) - 
carbodiimide and a succinylated carrier [M.H. Klapper 

15 and I.M. Klotz, "Acylation with Dicarboxylic Acid 
Anhydrides", Methods In Enzvmolocrv , 25, pp. 531-36 
(1972)]. Since chemical coupling is not limited to one 
site on the gelsolin moiety, it is possible to couple 
more than one functional moiety to each gelsolin 

2 0 moiety. 

Multimeric and hetero-multimeric gelsolin 
fusion constructs according to this invention may be 
produced by binding gelsolin fusion polypeptides to 
phospholipids aggregated into a vesicle. The vesicle 

2 5 must comprise at least one phospholipid that binds to 

gelsolin, but may contain others as well. The 
phosphatidyl inositols, PIP and PIP 2 , are preferable 
components of the vesicle because they bind to 
gelsolin. To be effective the vesicles preferably 
30 contain at least 3% of PIP or PIP 2 • Other lipids that 
may comprise the vesicle include, but are not limited 
to, phosphatidylcholine (PC) , phosphatidyl ethanolamine 
(PE) , phosphatidylserine (PS) . One may also create 
vesicles containing detergents such as Triton. 

3 5 The production of phospholipid vesicles is 

well known to the art [D.M. Haver stick and M. Glaser, 
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"Visualization of Ca 2+ - induced Phospholipid Domains", 
Proc. Natl. Acad, Sci. . USA , 64 , pp. 4475-79 (1987)]. 
For example, dried lipids are mixed with water and the 
mixture is sonicated, producing vesicles. PIP should 
5 be sonicated more thoroughly than PIP 2 in order to 
obtain vesicles of similar size and binding. The 
gelsolin fusion polypeptide is then added and allowed 
to bind to the vesicles. The resulting product is a 
multimeric gelsolin fusion construct. 

10 The fact that a vesicle may comprise many 

different lipids and detergents allows great 
flexibility in engineering a fusion construct with 
desired characteristics. For example, one may produce 
vesicles that bind different numbers of gelsolin fusion 

15 polypeptides by varying the lipid composition of the 
starting materials to create larger vesicles, or by 
increasing the percentage of PIP or PIP 2 in the vesicle. 
Also, one may alter the half -life of the functional 
moiety. We expect that these vesicles will be subject 

20 to eventual degradation by lipases. By altering the 
lipid composition of the vesicle, one could vary the 
degradation rate of the vesicle. 

When phospholipid vesicles containing 
cavities are prepared in the presence of a bioactive 

25 molecule, such as those illustrated herein, that 

molecule will come to be enclosed within the vesicles. 
Accordingly, it is possible to produce a multimeric 
gelsolin fusion construct that encloses within it a 
bioactive agent. These liposomes may fuse with cell 

30 membranes, delivering their contents to cells and 
adding the gelsolin fusion polypeptide to the cell 
membrane . 

Hetero-multimeric gelsolin fusion constructs 
comprise at least two different functional moieties or 
35 two different gelsolin moieties. For example, hetero- 
multimeric gelsolin fusion constructs may comprise two 
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moieties or both a polypeptide moiety and a chemical 
moiety. 

Hetero-multimeric gelsolin fusion constructs 
5 are especially useful when the properties of the 
different moieties complement one another. For 
example, it is possible to combine receptors that bind 
to a particular target particle or cell with toxins or 
anti-retroviral agents in fusion proteins according to 

10 this invention to produce targeted toxic or anti- 
retroviral agents. Polypeptides useful as toxins 
include, but are not limited to, ricin, abrin, 
angiogenin, Pseudomonas Exotoxin A, pokeweed antiviral 
protein, saponin, gelonin and diphtheria toxin, or 

15 toxic portions thereof. Useful ant i -retroviral agents 
include suramin, azidothymidine (AZT) , dideoxycytidine 
and glucosidase inhibitors such as castanospermine , 
deoxynojirimycin and derivatives thereof. 

Hetero-multimeric gelsolin fusion constructs 

20 according to this invention are also useful as 

diagnostic agents to identify the presence of a target 
molecule in a sample or in vivo . Such proteins 
comprise one functional moiety which is a recognition 
molecule, such as an immunoglobulin or a fragment 

25 thereof (Fab, dAb) that binds to the target molecule 
[See Ward et al., supra 1 and a second functional 
moiety, which is a reporter group, such as a 
radionuclide, an enzyme (such as horseradish 
peroxidase) or a fluorescent or chemi luminescent 

3 0 marker. Typically, the reporter group will be bound 
directly to the reporter group; for example, HRP is 
bound directly to the immunoglobulin. Many reporter 
groups may be coupled to a multimeric gelsolin fusion 
constructs thereby enhancing the signal. These 

35 constructs may be used, for example, to replace 

antibodies as the recognition molecules that contact 
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the sample in ELISA-type assays, or as in vivo imaging 
agents . 

Hetero-multimeric gelsolin fusion constructs 
according to this invention may also be used as multi- 
5 vaccines. For example, one may produce such constructs 
using several different antigenic determinants from the 
same infective agent. Also, one can produce constructs 
comprising antigenic determinants from several 
infective agents, such as polio, measles, mumps and 

10 others used for childhood vaccination. 

The pharmaceutical compositions of this 
invention typically comprise a pharmaceutically 
effective amount of a multimeric gelsolin fusion 
construct and a pharmaceutically acceptable carrier. 

15 Therapeutic methods of this invention comprise the step 
of treating patients in a pharmaceutically acceptable 
manner with those compositions. These compositions may 
be used to treat any mammal, including humans. 

The pharmaceutical compositions of this 

2 0 invention may be in a variety of forms. These include, 
for example, solid, semi-solid and liquid dosage forms, 
such as tablets, pills, powders, liquid solutions or 
suspensions, liposomes, suppositories, injectable and 
infusible solutions and sustained release forms. The 

25 preferred form depends on the intended mode of 

administration and therapeutic application. The 
compositions also preferably include conventional 
pharmaceutically acceptable carriers and adjuvants 
which are known to those of skill in the art. 

30 Generally, the pharmaceutical compositions of 

the present invention may be formulated and 
administered using methods and compositions similar to 
those used for pharmaceutically important polypeptides 
such as, for example, alpha interferon. The fusion 

35 constructs of this invention may be administered by 
conventional routes of administration, such as 
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parenteral, subcutaneous, intravenous, intramuscular or 
intralesional routes. It will be understood that 
conventional doses will vary depending upon the 
particular molecular moiety involved. 
5 In order that this invention may be better 

understood, the following examples are set forth. 
These examples are for the purposes of illustration 
only, and are not to be construed as limiting the scope 
of the invention in any manner. 

10 In the examples that follow, the molecular 

biology techniques employed, such as cloning, cutting 
with restriction enzymes, isolating DNA fragments, 
filling out with Klenow enzyme and deoxyribonucleotides 
triphosphate (dXTP) , ligating, transforming E.coli and 

15 the like are conventional protocols exemplified and 
further described in J. Sambrook et al., Molecular 
Cloning, A Laboratory Manual , Cold Spring Harbor 
Laboratory Press, Cold Spring Harbor, New York (1989) . 

EXAMPLE I - PRODUCTION OF A CD4 -GELSOLIN FUSION 
20 POLYPEPTIDE BY GENETIC FUSION 

1. Cloning of t>CD4-Gelsolin 

We constructed a plasmid expression vector 
containing a DNA sequence encoding a CD4-gelsolin 
fusion polypeptide and used it to transform E.coli. 

25 The coding region contains a DNA sequence for CD4(181) 
fused to the 5' end of 14 0 bp fragment encoding a 12 
amino-acid spacer and amino acids 150-173 of gelsolin. 
This includes the PIP 2 binding domain. We constructed 
the plasmid as follows. (See Figure 6.) 

3 0 First, we produced a DNA sequence containing 

the human gelsolin PIP 2 binding domain. The PIP 2 
binding domain is encompassed within amino acids +150 
to +169 (nucleotides 541-600) of Figure 1 (SEQ ID 
NO:l). We created this DNA sequence from the plasmid 

35 pMID which contains the cDNA human gelsolin-encoding 
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sequence of Figure 1 (SEQ ID NO:l). (Plasmid pMID was 
the gift of David Kwiatkowski, Harvard Medical School, 
Boston, Massachusetts.) We amplified a cDNA sequence 
for the PIP 2 binding domain using polymerase chain 
5 reaction (PCR) (Sambrook et al. # Chapter 14), We 

carried out all amplifications using Tag DNA polymerase 
and primers prephosphorylated with T4 polynucleotide 
kinase and ATP. We used the oligonucleotide ACE 144 
(SEQ ID NO: 3) as the sense primer (which hybridizes to 

10 the anti-sense strand) and ACE 145 (SEQ ID NO: 4) as the 
anti-sense primer. (See Figure 5*) We filled out the 
amplified fragments with Klenow enzyme and dXTP. This 
produced blunt-ended 140 bp DNA fragments having a 
Bglll site near the 5 1 end and an EcoRI site near the 

15 3 1 end. The fragments encoded gelsolin amino acids 
+143 through +173 (see SEQ ID NO:l). 

Then we digested an intermediate plasmid, 
pNN03, with EcoRV and dephosphorylated the ends to 
prevent recircularization. Plasmid pNN03 is derived 

20 from pUC13 by the incorporation of a poly linker. 

(Pharmacia PL Biochemicals) . We subcloned the 140 bp 
fragments into this plasmid. We called the resulting 
plasmid pGell. 

We then inserted the Balll/EcoRI DNA fragment 

25 encoding the gelsolin PIP 2 binding domain from pGell 
into a prokaryotic expression vector containing a DNA 
sequence encoding CD4(181) and derived from pEX56. 

Plasmid pEX56 encodes CD4(181) fused in- 
frame to the 5' end of a DNA insert encoding 

30 Pseudomonas endotoxin. The insert is bordered by EcoR I 
sites at the 5 1 and 3' ends and contains a Bal ll site 
at the junction of the CD4 -endotoxin sequence. The 
Pseudomonas endotoxin gene has been altered to remove 
the ribosome binding region. Plasmid pEX56 is created 

35 by site-directed mutagenesis of pEX46 (Example III, 
section 2 and Figure 13 (SEQ ID NO: 13) with 



WO 91/17170 



PCT/US91/02954 



- 35 - 

oligonucleotide T4-AID 176 (Figure 5, SEQ ID N0:9). 
[The plasmid is described in co-pending PCT application 
PCT/US89/04584, incorporated herein by reference.] 
We digested a first sample of pEX56 with 
5 EcoRI and Bglll and isolated the 613 bp fragment that 
encodes CD4 (181) • Then we digested a second sample of 
pEX56 with EcoRI, dephosphorylated the fragments, and 
isolated the 3 922 bp fragment representing the pEX56 
vector portion. We ligated together the 3922 bp EcoRI 

10 fragment, the 613 bp EcoRI / Bgll I fragment and the 14 0 
bp Bglll /EcoRI fragment. We used this ligation mixture 
to transform E.coli JA221 [ATCC 33875] by standard 
CaCl 2 procedures. (See Sambrook, Chapter 1.82.) We 
identified the plasmids pCD4 -gelsolin and paCD4- 

15 gelsolin (opposite orientation and therefore non- 
expressing) by restriction digests of mini-plasmid DNA 
preparations. The plasmid map of pCD4-gelsolin is 
shown in Figure 8. The DNA sequence and predicted 
amino acid sequence of the CD4 -gelsolin fusion 

20 polypeptide obtained is shown in Figure 7 (SEQ ID 

N0:10). We have deposited an isolate of pCD4-gelsolin 
with In Vitro International, IVI-10253. 

2 . Expression of CD4-Gelsolin 

We transformed E.coli JA221 and E.coli A89 
25 (an htpR" protease deficient mutant) with pCD4-gelsolin 
and paCD4 -gelsolin. E.coli A89 is a tetracycline- 
sensitive mutant of E.coli SG936 [ATCC 39624]. We then 
tested the cultures for the production of CD4 -gelsolin. 
Our results showed that pCD4 -gelsolin, but not paCD4- 
3 0 gelsolin, produced a polypeptide of the molecular 
weight predicted for CD4 -gelsolin. 

We grew 5 ml overnight cultures in LB + 12 . 5 
/xg/ml tetracycline at 3 0 °C. We diluted the overnight 
cultures 1:10 into LB + 12.5 /xg /nil tetracycline and 
35 grew the cultures until the optical density at 550 rati 
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was between 1 and 1.5. We then added the culture to an 
equal volume of LB + 12.5 /zg/ml tetracycline at 42°C 
After two hours we harvested the cells , lysed them, and 
analyzed the contents for a protein band corresponding 
5 to the size expected for a CD4 -gelsolin fusion molecule 
by SDS-polyacrylamide gel electrophoresis (SDS-PAGE) . 
We thus identified a protein having molecular weight of 
about 28 kD. 

We have isolated this protein using the 
10 protocol of Example III, section 2b. 

EXAMPLE II CHEMICAL CROSS -LINKING OF A GELSOLIN 

MOIETY TO CD 4 VIA ALDEHYDE-AMIDE LINKAGE 

We cross-linked CD4(375) (a gift of Biogen, 
Inc. , Cambridge, Massachusetts) to a gelsolin moiety by 
15 oxidizing sugars on the CD4 glycoprotein to aldehydes 
and then reacting an aldehyde with an amine on the 
gelsolin moiety to create an aldehyde-amine linkage. 

1. Oxidation of CD4T375) 

We dialyzed 100 fiH CD4(375) protein against 
20 0.1 M sodium acetate pH 5.0 at 4°C. We incubated the 

preparation at 23 °C for 1 hour with 1 mM aqueous sodium 
periodate and immediately desalted on a P6DG column 
(BioRad, Richmond, California) that was equilibrated in 
10 mM sodium acetate pH 5.0 r 100 mM NaCl. We stored 
25 the oxidized CD4(375) at 4°C for subsequent use or at 
-70°C for long term storage. We monitored the extent 
of oxidation by measuring incorporation of tritiated 
sodium borohydride. Typically 8-10 aldehydes per 
CD4(3 75) were generated. 
30 To confirm that oxidation did not interfere 

with the CD4(375) function, we assessed the ability of 
the modified protein to bind gpl20 in an ELISA format. 
We coated IMMULON II® plates (Dynatech Laboratories, 
Chant illy, Virginia) , with gpl20 (a gift of Biogen, 
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Inc., and commercially available from American 
Bio-Technologies, Inc., Cambridge, Massachusetts), 
added CD4(375) or oxidized CD4(375), and then 
determined the binding with a reporter system using 
5 OKT4 antibody (available from Ortho Diagnostics 

Systems, Raritan, New Jersey) that was conjugated with 
horseradish peroxidase. There was no difference in 
binding of soluble CD4 protein or oxidized CD4 to 
gpl20. Upon amino acid analysis, both samples were 
10 also found to be similar with no apparent effect of 
oxidation on individual amino acids. 

2. Reaction of Oxidized CD4(375) 
with the Gelsolin Moiety, GELl 

We synthesized a gelsolin moiety, GELl, using 

15 an Applied Biosystems 43 OA peptide synthesizer. GELl 
has the amino acid sequence Gly-Tyr-Gly-Lys-His-Val- 
Val-Pro-Asn-Glu-Val-Val-Val-Gln-Arg-Leu-Phe-Gln-Val- 
Lys-Gly-Arg-Arg (SEQ ID NO: 14). The final twenty amino 
acids constitute the PIP 2 -binding sequence of gelsolin, 

20 amino acids +150 to +169 (see SEQ ID NO:l). To 

crosslink GELl with CD4(375), we incubated varying 
concentrations of GELl overnight at 23 °C with 10 
oxidized CD4(375) in the presence of 50 mM MES, pH 6.5, 
and 5 mM sodium cyanoborohydr ide . 

25 We tested the sample for crosslinking by SDS- 

PAGE. Samples were either analyzed directly by 
staining with Coomassie brilliant blue or by Western 
blotting using an antiserum raised in rabbits against 
GELl. The immunogen consisted of GELl crosslinked to 

30 Keyhole limpet hemocyanin with glutaraldehyde. 

We found a dose dependent increase in the 
molecular weight of CD4 treated with GELl, indicating 
that the protein had become modified. At low peptide 
concentrations, there was little effect on the mobility 

35 of CD4(375) but when incubated with 1 mM GELl , 
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approximately 50% of the CD4(375) migrated with an 
increased apparent molecular weight that is consistent 
with it containing one GEL1 peptide per CD4(375). When 
CD4(375) was incubated with 10 mM GEL1, all of the 
5 protein shifted to a high-molecular weight form. We 
observed a series of bands that likely correspond to 
moieties with one, two, and three gelsolin moieties per 
CD4(375). The need for a large molar excess of GEL1 
over CD4(375) to drive the crosslinking reaction is 

10 consistent with the results obtained for modifying 
periodate oxidized CD4 with other amino-containing 
reagents as well. (See Example III.) 

To verify that GELl had been crosslinked to 
CD4(375), we analyzed selected fractions by Western 

15 blotting using antibodies against GELl. A prominent 
immunoreactive band was observed in the sample after 
crosslinking. This band is absent from the Western 
blot of an untreated CD4 sample. 

3. Analysis of the CD4-Gelsolin 
20 Fusion Polypeptide 

We demonstrated above that the crosslinking 
chemistry did not affect the ability of CD4(375) to 
bind gpl20. We have further established that CD4(375)- 
gelsolin fusion polypeptides bind to PIP 2 vesicles. 

25 We assayed the ability of CD4 (375) -gelsolin 

to associate with PIP or PIP 2 vesicles using an 
aggregation assay similar to that described by Janmey 
et al., "Phosphoinositide Micelles and 
Polyphosphoinositide-containing Vesicles Dissociate 

30 Endogenous Gelsolin-actin complexes and Promote Act in 
Assembly From the Fast-growing End of Actin Filaments 
Blocked by Geloslin" , J. Biol. Chem. , 262 . pp. 12228- 
36 (1987) . In the assay, the amount of protein used is 
appropriately adjusted to take into account the 

35 molecular weight of the CD4-gelsolin fusion 
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polypeptide. Mg 4 " 1 " causes micelles of pure 
polyphosphoinositides to aggregate into larger 
vesicles, increasing the turbidity of the solution. 
However, gelsolin inhibits this aggregation. We found 
5 that CD4 (375) -gelsolin behaved like the GEL1 peptide in 
this assay. Recombinant sCD4, alone, had no activity 
in this assay. 

Because the junction between the gelsolin 
peptide fragment and the spacer is unnatural, it may be 

10 necessary to change the composition or length of the 
spacer region in order to optimize function. This 
involves resynthesizing the gelsolin peptide fragment 
with other sequences added at either the amino or 
carboxy terminus of the polypeptide. The coupling 

15 chemistry would not be affected. Alternatively, it may 
be advantageous to change selected amino acids from the 
binding sequence in order to change the affinity of the 
fusion polypeptide for PIP 2 * 

EXAMPLE III STRATEGIES FOR CROSSLINKING CD4 
20 ^ THROUGH THIOL GROUPS 

We describe here three strategies for 
crosslinking the CD4 polypeptide moiety with a gelsolin 
moiety through thiol groups. They involve the 
modification of the CD4 protein to contain a cysteine, 
25 a free thiol or a thiol-reactive group. 

1 - Introducing a Free Thiol into CD4 

First, a thiol group may be introduced into 
CD 4 using thiol-containing amines, such as cysteine, 
cystamine or glutathione. An aldehyde is introduced 
30 into CD 4 and then one creates an aldehyde-amine linkage 
(see Example II) . Once the thiol-containing CD4 is 
generated, it can be selectively crosslinked to the 
gelsolin moiety. 
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We incubated periodate oxidized CD4(375) 
(0.5 mg/ml) overnight at 23 °C in 50 mM MES, pH 6.5, 
5 mM sodium cyanoborohydride with 20 mM of either 
cysteine, oxidized cystamine or oxidized glutathione to 
5 create CD4 (cysteine) , CD4 (cystamine) , and 

CD4 (glutathione) . We treated the samples with 40 mM 
DTT for 40 minutes at 23 °C. We then dialyzed them 
against storage buffer (10 mM sodium acetate, pH 5.0, 
100 mM NaCl) . We monitored the extent of modification 

10 with Ellman's reagent. Briefly, we diluted the samples 
into 100 fil of 100 mM sodium phosphate pH 8.0, 0.5 mM 
DTNB and measured the absorbance after 5 minutes at 
410 nm. We calibrated the samples against a standard 
curve that was developed with reduced glutathione. 

15 Both cystamine and glutathione treatments resulted in 
three to five groups per CD4. For subsequent studies, 
the preparations were concentrated to 5 mg/ml using a 
CENTRICON-10® filtration unit (Amicon, Danvers, 
Massachusetts) . 

20 These molecules may be bound to gelsolin 

moieties through the thiol groups using homo- 
bifunctional crosslinking agents with two thiol- 
reactive groups, such as BMH or o- or p-phenylene 
dimaleimide. We believe that this method will result 

25 in crosslinking because treatment of CD4 (cystamine) 

with these agents induced the formation of CD4 dimers 
and higher molecular weight complexes. With sub- 
stoichiometric amounts of crossl inker we were able to 
drive crosslinking of CD4 to greater than 50%. A 

30 similar strategy will be used with the cysteine- 

containing gelsolin moiety where a dimaleimide agent 
will be used to generate crosslinking complexes. 

Alternatively, the moieties may be 
crosslinked through disulfide bonds using conventional 

3 5 techniques . 
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2. Introducing a Free Cysteine into 
CD4 bv Site-Specific Mutagenesis 

Second, a free cysteine may be introduced in 
the primary sequence of CD4 through genetic 
5 engineering. crosslinking to the gelsolin moiety is 
then directed using the methods of section 1 of this 
example. We describe herein the construction and 
isolation of two truncated forms of CD4 engineered to 
contain cysteine residues at their C-termini. 

10 a. Construction of pDC219 

and Expression of CD4 ( lllCys) 

To produce CD4(lllCys) we constructed the 
expression plasmid pDC219. (See Figure 9.) We began 
with p218-8, a plasmid in which the AP L promoter 

15 controls the expression of CD4(111). This plasmid is 
described in PCT patent application WO 89/0194, 
p. 77/93, Figure 28. The DNA sequence for p218-8 is 
depicted in Figure 10 (SEQ ID NO: 11). We digested a 
first sample of p218-8 with Pst I and Bglll and isolated 

20 the 3 645 bp fragment. We then digested a second sample 
of p218-8 with Pst I and EcoRI and isolated the 269 bp 
fragment. We digested a third sample of p218-8 with 
EcoR I and BspM I and isolated the 395 bp fragment. We 
isolated these fragments by electrophoresing the 

25 digests on agarose gels, cutting out the relevant bands 
and electroeluting the DNA fragments. We precipitated 
the electroeluted DNA fragments with ethanol, 
centrifuged the mixture to pellet the DNA fragments and 
resuspended the fragments in 10 mM Tris-HCl, pH 8.0, 

30 1 mM Na 2 EDTA. 

We phosphorylated oligonucleotides T4AID-133 
(SEQ ID N0:5) and T4AID-134 (SEQ ID NO: 6) (Figure 5) 
using bacteriophage T4 polynucleotide kinase. These 
oligonucleotides contain a Bg[lII recognition sequence. 
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Then we ligated the purified DNA fragments and the 
ol igonucleotides . 

We used the reaction mixture to transform 
E.coli DHL We selected colonies that grew at 30°C, 
.5 12*5 ng /ml tetracycline and analyzed them for the 
correct sequences by digestion with Bcrl ll. We 
subjected those plasmid DNAs which had the additional 
BgAII site to DNA sequence analysis. Thus we obtained 
pDC219. 

10 To produce CD4(lllCys), we transformed A89 

cells with pDC219 and fermented the cells at a 10 liter 
scale. (We achieved an expression level of 13%.) We 
stored cells as frozen cell pellets. 

To isolate CD4(lllCys) we thawed 50 g frozen 

15 whole cells, suspended them in 20 mM Tris pH 7.5, 1 mM 
EDTA, 0.4 mg/ml lysozyme, and mixed with a Polytron 
(Brinkman Instruments, Westbury, N.Y.). We stirred the 
cell slurry at room temperature for one hour, then 
passed it three times through a prechilled Manton 

2 0 Gaulin French press (550 setting) . We chilled the 
lysate on ice between each passage. We pelleted 
particulates in a SA600 rotor for 15 minutes at 10,000 
rpm. We washed the resulting pellet twice with a 1:4 
dilution in 20 mM Tris pH 9.0 and pelleted it as 

25 before. (All ratios given are whole cell weight to 
buffer volume.) We then washed the pellet with a 1:4 
dilution in 20 mM Tris pH 9.0 containing 0.5 M NaCl and 
spun down the pellet using previous conditions by 
resuspending with a Polytron. We extracted the final 

30 pellet in a 1:4 dilution of extraction buffer (7 M 
urea, 20 mM Tris 9.0, 10 mM /3-mercaptoethanol) and 
stirring at room temperature for 15 minutes. We 
removed debris by centrifugation in a SA600 rotor at 
15,000 G for 3 0 minutes. 

35 We diluted the clarified supernatant 1:4 with 

fresh extraction buffer and passed it over a Fast S 
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cation exchange column (Pharmacia) pre-equilibrated 
with extraction buffer at a column ratio of 1 gm whole 
cells to 4 ml resin. We washed the column extensively 
with extraction buffer. We then eluted the protein 
5 with salt steps of half column volume of extraction 
buffer containing 0.05 M, 0.075 M, 0.1 M, 0.15 M and 
0.2 M NaCl, respectively. CD4(lllCys) routinely eluted 
in the 0.15 M NaCl step. 

We pooled the CD4(lllCys) peak and diluted it 

10 to an absorbance of under O.D. 0.5 at 280 nm. Then we 
dialyzed the sample overnight, 1:100 V:V, with one 
change, against 3 M urea, 2 0 mM Tris pH 7.5. We 
diluted the dialysate to 1 M urea with the 2 0 mM Tris 
pH 7.5, and filtered it through 0.4 5 m sterile filter 

15 unit. We bound CD4 from the filtrate to 6C6-Sepharose 
for one hour at 4°C with rocking. 6C6 is a monoclonal 
antibody developed at Biogen that recognizes CD4 and 
blocks CD 4 binding to gpl20. Alternatively, one may 
use anti-Leu-3a, a monoclonal available from Becton- 

20 Dickinson, Mountain View, California. Then we poured 
the slurry into a column and washed with 2 x 0.5 column 
volumes 50 mM Tris pH 7.5, 0.5 M NaCl (wash 2), and 2 x 
0.5 column volumes of wash 1 buffer (wash 3). 
CD4(lllCys) was eluted from the resin with a series of 

25 0.1 column volume additions of 50 mM glycine, pH 3.0, 

250 mM NaCl. We neutralized the eluate by the addition 
of 2 M Tris pH 9.0 to 50 mM. 

The resulting affinity purified protein was 
90% CD4(lllCys) monomer with contaminating multimeric 

30 bands. When run under reducing conditions these 

additional bands collapsed into the monomer, indicating 
they were disulfide forms of the protein. From 1 gm 
wet weight of cells we recovered between 0.5 to 0.75 mg 
of CD4(lllCys). We assayed the gpl20 binding activity 

35 and found it to be about half the specific activity 
that is observed for full length CD4 . 
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We carried out biotinylation studies using 
maleimidobutyryl biocytin (MBB) to test the 
susceptibility of the engineered cysteine to 
modification with the maleimide. We monitored biotin 
5 labeling on Western blots using avidin-conjugated HRP 

^. 

to track the biotin. Specific biotin labeling of 
CD4(lllCys) was observed when fresh samples were 
analyzed; however , the efficiency of labeling decreased 
with time as the samples aged* 

10 b. Construction of AP 180cys and 

Expression of CD4n_80Cvsl 

To produce CD4(180Cys) f we constructed the 
expression plasmid AP^lSOCys, in which a AP L promoter 
controls the expression of a DNA sequence encoding 

15 CD4(180Cys). (See Figure 11.) 

We began with plasmid pBG391 f an animal cell 
expression vector that expresses CD4(375). (The DNA 
sequence of this plasmid is set forth in Figure 12 (SEQ 
ID NO: 12)). We cleaved pBG391 with Stul. Stul cuts 

20 the CD4 gene at the codon for amino acid 182. We 

phosphorylated oligonucleotides T4AID-137 (SEQ ID NO: 7) 
and T4AID-138 (SEQ ID NO: 8) (Figure 5) and ligated into 
the Stul-cleaved pBG391. This generated pBG398C2. We 
identified pBG398C2 by the presence of a BamH I site, 

25 generated at the junction of the Stu l site and T4AID- 
137. 

Then we cleaved pBG398C2 with SacI and Bglll 
and isolated the 490 bp fragment. We cleaved pEX46 
with Sac I and BamH I and isolated the large fragment. * 
3 0 (The DNA sequence of pEX46 is set forth in Figure 13 

(SEQ ID N0:13)). Then we ligated the two fragments v 
together. This generated plasmid AP L 180cys. 

In 10 liter fermentations, CD4(18GCys) was 
expressed at about 5% of the total cell protein. 



WO 91/17170 



PCT/US91/02954 



- 45 - 

We suspended fermentation cells at 8 ml/gm 
cell wet weight in 20 mM Tris-HCl, 1 mM Na 2 EDTA, pH 
7.7, broke them in two passes through a French press 
and washed them twice with 20 ml/gm cell wet weight of 
5 1 M guanidine-HCl, 1 M urea, 15 mM sodium acetate, pH 5 
followed by two washes in 20 mM Tris-HCl, 1 mM Na 2 
EDTA, pH 7.7. We extracted the washed pellet with 
25 ml/gm cell wet weight of 6 M guanidine-HCl, 20 mM 
Tris-HCl, 10 mM DTT, pH 7 . 7 overnight at room 

10 temperature. We spun the suspension for 4 5 minutes in 
a SS-34 rotor at 20,000 rpm. We diluted the 
supernatant 1:60 into cold 2 0 mM Tris-HCl, pH 7.7 and 
added BSA to a final concentration of 0.5 mg/ml. 

To generate microgram amounts of the protein, 

15 we concentrated the diluted extract by ultrafiltration 
using a PM10® membrane (Amicon) followed by affinity 
purification on 6C6-Sepharose 4B. Alternatively, 
CD4(180Cys) may be prepared as follows: The pH of the 
diluted extract obtained as described above is lowered 

20 to 7.0 with HC1 and loaded at 1% vol/vol onto a Fast S 
column equilibrated in 20 mM Tris-HCl, pH 7.0. Bound 
protein is washed with 5 column volumes of 
equilibration buffer and eluted with 0.2 M NaCl in the 
same buffer. The elution pool is diluted with one 

25 volume of 2 0 mM Tris-HCl, pH 7.7 and loaded on a 

6C6-Sepharose 4B column. The bound protein is washed 
and eluted from the affinity column in 50 mM glycine, 
250 mM NaCl, pH 3.0. The elution fractions are 
neutralized with 1/15 volume of 0.5 M HEPES pH 7.5, 

30 pooled according to the A 28Q profile and stored at 4°C. 

One may bind CD4(lllcys) or CD4(180cys) to a 
thiol-containing gelsolin moiety using the chemistries 
described in section 1 of this example. 
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3 . Hetero-bifunctional Crosslinking Agents 

According to a third method, CD 4 may also be 
crosslinked with a cysteine-containing gelsolin moiety 
using a hetero-bifunctional crosslinking agent. Such 
5 crosslinkers include succinimidyl 4-(N-maleimidomethyl) 
cyclohexane-l-carboxylate (SMCC) , m-maleimidobenzoyl- 
N-hydroxysuccinimide ester (MBS) , or N-succinimidyl 3- 
(2-pyridyldithiol) proprionate (SPDP). The 
succinimidyl arms of these crosslinkers bind to primary 

10 amines in CD4. The reactive thiol (maleimide) of SMCC 
and MBS and the activated thiol of SPDP react with the 
thiol from the cysteine in the gelsolin moiety to form 
the co va lent linkage. 

To carry out the reaction with SMCC and MBS, 

15 the crosslinker is incubated with CD4 for 0.5 hours at 
pH 6.0 at 23 °C Unreacted crosslinker is then removed 
on a desalting column. SPDP is used as described in 
the Pharmacia Co. Users Manual. A gelsolin moiety 
having a free terminal cysteine is then added. The 

20 mixture is incubated for 3 hours at 23 °C, creating the 
covalent linkage. Unreacted gelsolin moiety is removed 
on a desalting column. 

The extent and specificity of the 
modification can be analyzed as described in 

25 Example II. The lysine content of CD4 is high; 

therefore reactions with lysine would not provide much 
specificity. However, by limiting the amount of 
crosslinker added, it may be possible to direct 
crosslinking to one or a small number of lysines that 

30 are particularly reactive. 

Alternatively, one may bind the reactive 
thiol group of the hetero-bifunctional crosslinker to a 
thiol group introduced into CD4 and then bind the 
succinimidyl arm to an amine in the gelsolin moiety. 
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EXAMPLE IV - MULTIMERIC GELSOLIN FUSION CONSTRUCT 

We have shown that CD4-gelsolin fusion 
polypeptides retain affinity for gpl20 and that they 
bind PIP 2 vesicles through the gelsolin moiety. This 
5 demonstrates that the chemistry we have developed to 
produce multimeric gelsolin fusion constructs is sound. 
As a next step, we produced and tested a multimeric 
CD4 (375) -gelsolin fusion construct. 

Multimeric gelsolin fusion constructs 
10 comprising CD4-gelsolin fusion polypeptides were 

produced using methods that involve binding the fusion 
polypeptides to PIP 2 vesicles. 

PIP 2 vesicles were produced in the following 

manner. PI^ 2 may be obtained as a lyophilized solid 
15 (Sigma Chemical Co., St. Louis, Missouri). Water was 
added to the dried sample to a concentration of 1 to 3 
mg/ml and the mixture was sonicated for between 3 0 
seconds to 2 minutes at maximum intensity in a Heat 
Systems - Ultrasonics, Inc. (Farmingdale, New York) 

2 0 W185® apparatus or its equivalent until an optically 

clear solution formed. These samples were kept at 4°C 
and used within a week or they were stored frozen for 
future use. For storage, the samples were divided into 
aliquots, frozen in liquid nitrogen and stored at -70° 
25 until use. Prior to use, the samples were thawed 

quickly under a stream of warm water and sonicated for 
3 0 minutes at room temperature in a water bath 
sonicator . 

CD4-gelsolin fusion polypeptides were then 

3 0 added to lipid at a 5 to 10 molar excess of lipid over 

protein and the mixture was incubated at room 
temperature for about five minutes. 

We tested the ability of the multimeric 
CD4 (375) -gelsolin fusion construct to bind gpl20 in an 
35 ELISA-type assay. Briefly, we coated plates with 
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gpl20, added the fusion construct and assayed for 
binding using anti-CD4 as the reporter antibody. We 
did not detect binding of the multimeric CD4(375)- 
gelsolin fusion construct to gpl20. 
5 We also tested the biological activity of the 

fusion construct in a viral replication assay similar 
to the one described in co-pending United States 
application 07/583,022 (incorporated herein by 
reference) . Briefly, we incubated the fusion construct 

10 with HIV, added cells from a T-cell line, and measured 
the incidence of infection- Multimeric CD4(375)- 
gelsolin fusion construct did not block infection in 
this assay. 

As a result, we found that rsCD4 , itself, 

15 binds to PIP 2 vesicles and that in doing so, its 

ability to bind gpl20 is inactivated. Recombinant sCD4 
has pockets of positive charge that cause it to bind to 
cation exchange matrices with high avidity at neutral 
pH. Since PIP 2 vesicles, like cation exchange 

2 0 matrices, possess high negative charge, we believe that 

the binding of rsCD4 to PIP 2 vesicles is due to its 
ionic character. 

Therefore, one may produce multimeric CD4- 
gelsolin fusion constructs that bind gpl20 by altering 
25 the charge of the CD4 moiety so that it no longer binds 
PIP 2 vesicles. The first one-hundred-thirteen amino 
acids of rsCD4, which contain the gpl20 binding domain, 
contain sixteen basic amino acid residues: thirteen 
lysine residues and three arginine residues. Using 

3 0 site specific mutagenesis, one may alter one or more of 

these into histidine, a basic, but less polar amino 
acid, or into neutral amino acids. Among these 
alternate versions of CD4 , one may select molecules 
that bind gpl2 0 but do not bind PIP 2 vesicles. We 
35 believe that these alternate versions of CD 4 would be 
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useful to produce multimeric CD4-gelsolin fusion 
constructs that possess gpl20 binding ability. 

Although they do not bind gpl20, multimeric 
CD4 (181) -gelsolin fusion constructs have other uses, 
5 For example, they are useful as immunogens to elicit a- 
CD4 antibodies. In diagnostic assays, they are useful 
to detect the presence of a-CD4 in a sample. A 
percentage of patients infected with HIV exhibit a-CD4 
antibodies . 

10 Positive charge at neutral pH and high salt 

concentration is uncommon among proteins. Accordingly, 
we do not believe that many proteins other than CD4 
would exhibit deactivation when employed to produce 
multimeric-gelsolin fusion constructs according to this 

15 invention. Nevertheless, the ionic character and 
lipid-binding properties of potential functional 
moieties are factors to be considered in predicting the 
ultimate biological activity and characteristics of 
multimeric gelsolin fusion constructs produced using 

2 0 them . 

Microorganisms and recombinant DNA molecules 
according to this invention are exemplified by cultures 
deposited in the In Vitro International, Inc. culture 
collection, in Linthicum, Maryland, USA on May 4, 1990, 
25 and identified as: 

pCD4 -gelsolin IVI-10253 
pl70.2 IVI-10252. 

While we have hereinbefore described a number 
of embodiments of this invention, it is apparent that 

3 0 our basic embodiments can be altered to provide other 

embodiments which utilize the processes and 
compositions of this invention. Therefore, it will be 
appreciated that the scope of this invention includes 
all alternative embodiments and variations which are 
35 defined in the foregoing specification and by the 
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claims appended hereto; and the invention is not to be 
limited by the specific embodiments which have been 
presented herein by way of example. 
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(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2588 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

ATGGCTCCGC ACCGCCCCGC GCCCGCGCTG CTTTGCGCGC TGTCCCTGGC GCTGTGCGCG 60 

CTGTCGCTGC CCGTCCGCGC GGCCACTGCG TCGCGGGGGG CGTCCCAGGC GGGGGCGCCC 120 

CAGGGGCGGG TGCCCGAGGC GCGGCCCAAC AGCATGGTGG TGGAACACCC CGAGTTCCTC 180 

AAGGCAGGGA AGGAGCCTGG CCTGCAGATC TGGCGTGTGG AGAAGTTCGA TCTGGTGCCC 240 

GTGCCCACCA ACCTTTATGG AGACTTCTTC ACGGGCGACG CCTACGTCAT CCTGAAGACA 300 

GTGCAGCTGA GGAACGGAAA TCTGCAGTAT GACCTCCACT ACTGGCTGGG CAATGAGTGC 360 

AGCCAGGATG AGAGCGGGGC GGCCGCCATC TTTACCGTGC AGCTGGATGA CTACCTGAAC 420 

GGCCGGGCCG TGCAGCACCG TGAGGTCCAG GGCTTCGAGT CGGCCACCTT CCTAGGCTAC 480 

TTCAAGTCTG GCCTGAAGTA CAAGAAAGGA GGTGTGGCAT CAGGATTCAA GCACGTGGTA 540 

CCCAACGAGG TGGTGGTGCA GAGACTCTTC CAGGTCAAAG GGCGGCGTGT GGTCCGTGCC 600 

ACCGAGGTAC CTGTGTCCTG GGAGAGCTTC AACAATGGCG ACTGCTTCAT CCTGGACCTG 660 

GGCAACAACA TCCACCAGTG GTGTGGTTCC AACAGCAATC GGTATGAAAG ACTGAAGGCC 720 

ACACAGGTGT CCAAGGGCAT CCGGGACAAC GAGCGGAGTG GCCGGGCCCG AGTGCACGTG 780 

TCTGAGGAGG GCACTGAGCC CGAGGCGATG CTCCAGGTGC TGGGCCCCAA GCCGGCTCTG 840 

CCTGCAGGTA CCGAGGACAC CGCCAAGGAG GATGCGGCCA ACCGCAAGCT GGCCAAGCTC 900 

TACAAGGTCT CCAATGGTGC AGGGACCATG TCCGTCTCCC TCGTGGCTGA TGAGAACCCC 960 

TTCGCCCAGG GGGCCCTGAA GTCAGAGGAC TGCTTCATCC TGGACCACGG CAAAGATGGG 1020 

AAAATCTTTG TCTGGAAAGG CAAGCAGGCA AACACGGAGG AGAGGAAGGC TGCCCTCAAA 1080 

ACAGCCTCTG ACTTCATCAC CAAGATGGAC TACCCCAAGC AGACTCAGGT CTCGGTCCTT 1140 

CCTGAGGGCG GTGAGACCCC ACTGTTCAAG CAGTTCTTCA AGAACTGGCG GGACCCAGAC 1200 

CAGACAGATG GCCTGGGCTT GTCCTACCTT TCCAGCCATA TCGCCAACGT GGAGCGGGTG 1260 
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CCCTTCGACG CCGCCACCCT GCACACCTCC ACTGCCATGG CCGCCCAGCA CGGCATGGAT 1320 

GACGATGGCA CAGGCCAGAA ACAGATCTGG AGAATCGAAG GTTCCAACAA GGTGCCCGTG 1380 

GACCCTGCCA CATATGGACA GTTCTATGGA GGCGACAGCT ACATCATTCT GTACAACTAC 1440 

CGCCATGGTG GCCGCCAGGG GCAGATAATC TATAACTGGC AGGGTGCCCA GTCTACCCAG 1500 

GATGAGGTCG CTGCATCTGC CATCCTGACT GCTCAGCTGG ATGAGGAGCT GGGAGGTACC 1560 

CCTGTCCAGA GCCGTGTGGT CCAAGGCAAG GAGCCCGCCC ACCTCATGAG CCTGTTTGGT 1620 

GGGAAGCCCA TGATCATCTA CAAGGGCGGC ACCTCCCGCG AGGGCGGGCA GACAGCCCCT 1680 

GCCAGCACCC GCCTCTTCCA GGTCCGCGCC AACAGCGCTG GAGCCACCCG GGCTGTTGAG 1740 

GTATTGCCTA AGGCTGGTGC ACTGAACTCC AACGATGCCT TTGTTCTGAA AACCCCCTCA 1800 

GCCGCCTACC TGTGGGTGGG TACAGGAGCC AGCGAGGCAG AGAAGACGGG GGCCCAGGAG 1860 

CTGCTCAGGG TGCTG CGGGC CCAACCTGTG CAGGTGGCAG AAGGCAGCGA GCCAGATGGC 1920 

TTCTGGGAGG CCCTGGGCGG GAAGGCTGCC TACCGCACAT CCCCACGGCT GAAGGACAAG 1980 

AAGATGGATG CCCATCCTCC TCGCCTCTTT GCCTGCTCCA ACAAGATTGG ACGTTTTGTG 2040 

ATCGAAGAGG TTCCTGGTGA GCTCATGCAG GAAGACCTGG CAACGGATGA CGTCATGCTT 2100 

CTGGACACCT GGGACCAGGT CTTTGTCTGG GTTGGAAAGG ATTCTCAAGA AGAAGAAAAG 2160 

ACAGAAGCCT TGACTTCTGC TAAGCGGTAC ATCGAGACGG ACCCAGCCAA TCGGGATCGG 2220 

CGGACGCCCA TCACCGTGGT GAAGCAAGGC TTTGAGCCTC CCTCCTTTGT GGGCTGGTTC 2280 

CTTGGCTGGG ATGATGATTA CTGGTCTGTG GACCCCTTGG ACAGGGCCAT GGCTGAGCTG 2340 

GCTGCCTGAG GAGGGGCAGG GCCCACCCAT GTCACCGGTC AGTGCCTTTT GGAACTGTCC 2400 

TTCCCTCAAA GAGGCCTTAG AGCGAGCAGA GCAGCTCTGC TATGAGTGTG TGTGTGTGTG 2460 

TGTGTTGTTT CTTTTTTTTT TTTTTACAGT ATCCAAAAAT AGCCCTGCAA AAATTCAGAG 2520 

TCCTTGCAAA ATTGTCTAAA ATGTCAGTGT TTGGGAAATT AAATCCAATA AAAACATTTT 2580 

GAAGTGTG 2588 
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(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1377 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

ATGAACCGGG GAGTCCCTTT TAGGCACTTG CTTCTGGTGC TGCAACTGGC GCTCCTCCCA 60 

GCAGCCACTC AGGGAAAGAA AGTGGTGCTG GGCAAAAAAG GGGATACAGT GGAACTGACC 120 

TGTACAGCTT CCCAGAAGAA GAGCATACAA TTCCACTGGA AAAACTCCAA CCAGATAAAG 180 

ATTCTGGGAA ATCAGGGCTC CTTCTTAACT AAAGGTCCAT CCAAGCTGAA TGATCGCGCT 240 

GACTCAAGAA GAAGCTTGTG GGACCAAGGA AACTTTCCCC TGATCATCAA GAATCTTAAG 300 

ATAGAAGACT CAGATACTTA CATCTGTGAA GTGGAGGACC AGAAGGAGGA GGTGCAATTG 360 

CTAGTGTTCG GATTGACTGC CAACTCTGAC ACCCACCTGC TTCAGGGGCA GAGCCTGACC 420 

CTGACCTTGG AGAGCCCCCC TGGTAGTAGC CCCTCAGTGC AATGTAGGAG TCCAAGGGGT 480 

AAAAACATAC AGGGGGGGAA GACCCTCTCC GTGTCTCAGC TGGAGCTCCA GGATAGTGGC 540 

ACCTGGACAT GCACTGTCTT GCAGAACCAG AAGAAGGTGG AGTTCAAAAT AGACATCGTG 600 

GTGCTAGCTT TCCAGAAGGC CTCCAGCATA GTCTACAAGA AAGAGGGGGA ACAGGTGGAG 660 

TTCTCCTTCC CACTCGCCTT TACAGTTGAA AAGCTGACGG GCAGTGGCGA GCTGTGGTGG 720 

CAGGCGGAGA GGGCTT CCTC CTCCAAGTCT TGGATCACCT CTGACCTGAA GAACAAGGAA 780 

GTGTCTGTAA AACGGGTTAC CCAGGACCCT AAGCTCCAGA TGGGCAAGAA GCTCCCGCTC 840 

CACCTCACCC TGCCCCAGGC CTTGCCTCAG TATGCTGGCT CTGGAAACCT CACCCTGGCC 900 

CTTGAAGCGA AAACAGGAAA GTTGCATCAG GAAGTGAACC TGGTGGTGAT GAGAGCCACT 960 

CAGCTCCAGA AAAATTTGAC CTGTGAGGTG TGGGGACCCA CCTCCCCTAA GCTGATGCTG 1020 

AGCTTGAAAC TGGAGAACAA GGAGGCAAAG GTCTCGAAGC GGGAGAAGGC GGTGTGGGTG 1080 

CTGAACCCTG AGGCGGGGAT GTGGCAGTGT CTGCTGAGTG ACTCGGGACA GGTCCTGCTG 1140 

GAATCCAACA TCAAGGTTCT GCCCACATGG TCGACCCCGG TGCAGCCAAT GGCCCTGATT 1200 

GTGCTGGGGG GCGTCGCCGG CCTCCTGCTT TTCATTGGGC TAGGCATCTT CTTCTGTGTC 1260 
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AGGTGCCGGC ACCGAAGGCG C C AAG C AG AG CGGATGTCTC AGATCAAGAG ACTCCGCAGT 1320 
GAGAAGAAGA CCTGCCAGTG CCCTCACCGG TT TC AG AAG A CATGTAGCCC CATTTGA 1377 
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(2) INFORMATION FOR SEQ ID NO: 3: 

<i) SEQUENCE CHARACTERISTICS ; 

(A) LENGTH: 37 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
AGATCTACGG GGGCGTGGCA TCAGGAT TCA AGCACGT 
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(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS 2 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
GAATTCTTAG GCACGGACCA CACGCCG 27 
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(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 25 base pairs 
(6) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
GGGGTGTTGA TAGTAAGATC TTGCA 



25 
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(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
AGATCTTACT ATCAAGA 
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(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
ATCCCTGTCC GTAGAAGCTT ATCGAT 



26 
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(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
ATCGATAAGC TTCTACGGAC AGGGAT 
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(2) INFORMATION FOR SEQ ID NO: 9: 

<i) SEQUENCE CHARACTERISTICS: 
(A> LENGTH: 46 base pairs 

( B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
GGAGGACCAG AAAGAAGAAG TTCAGCTGCT GGTTTTCGGA TTGACT 46 
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(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 654 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

ATGAAAAAAG TAGTACTGGG CAAAAAAGGG GATACAGTGG AACTGACCTG TACAGCTTCC 60 

CAGAAGAAGA GCATACAATT CC AC TGGAAA AACTCCAACC AGATAAAGAT TCTGGGAAAT 120 

CAGGGCTCCT TCTTAACTAA AGGTCCATCC AAGCTGAATG ATCGCGCTGA CTCAAGAAGA 180 

AGCTTGTGGG ACCAAGGAAA CTTTCCCCTG ATCATCAAGA ATCTTAAGAT AGAAGACTCA 240 

GATACTTACA TCTGTGAAGT GGAGGACCAG AAAGAAGAAG TTCAGCTGCT GGTTTTCGGA 300 

TTGACTGCCA ACTCTGACAC CCACCTGCTT CAGGGGCAGA GCCTGACCCT GACCTTGGAG 360 

AGCCCCCCTG GTAGTAGCCC CTCAGTGCAA TGTAGGAGTC CAAGGGGTAA AAACATACAG 420 

GGGGGGAAGA CCCTCTCCGT GTCTCAGCTG GAGCTCCAGG ATAGTGGCAC CTGGACATGC 480 

ACTGTCTTGC AGAACCAGAA GAAGGTGGAG TTCAAAATAG ACATCGTGGT GCTAGCTTTC 540 

CAGAAGGGGA AGATCTACGG GGGCGTGGCA TCAGGATTCA AGCACGTGGT ACCCAACGAG 600 

GTGGTGGTGC AGAGACTCTT CCAGGTCAAA GGGCGGCGTG TGGTCCGTGC CTAA 654 
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(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4309 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY t circular 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

GAATTCTTAC ACTTAGTTAA ATTGCTAACT TTATAGATTA CAAAACTTAG GAAATCGATT 60 

TGGATGAAAA AAGTAGTACT GGGCAAAAAA GGGGATACAG TGGAACTGAC CTGTACAGCT 120 

TCCCAGAAGA AGAGCATACA ATTCCACTGG AAAAACTCCA ACCAGATAAA GATTCTGGGA 180 

AATCAGGGCT CCTTCTTAAC TAAAGGTCCA TCCAAGCTGA ATGATCGCGC TGACTCAAGA 240 

AGAAGCTTGT GGGACCAAGG AAACTT TCCC CTGATCATCA AGAATCTTAA GATAGAAGAC 300 

TCAGATACTT ACATCTGTGA AGTGGAGGAC CAGAAGGAGG AGGTGCAATT GCTAGTGTTC 360 

GGATTGACTG CCAACTCTGA CACCCACCTG CTTCAGGGGT GATAGTAAGA TCCTGCAGCC 420 

CAGCTTGGGG ACCCTAGAGG TCCCCTTTTT TATTTTGAAT TGGGAGATCC CAATTCTCAT 480 

GTTTGACAGC TTATCATCGA TAAGCTAGCT TTAATGCGGT AGTTTATCAC AGTTAAATTG 540 

CTAACGCAGT CAGGCACCGT GTATGAAATC TAACAATGCG CTCATCGTCA TCCTCGGCAC 600 

CGTCACCCTG GATGCTGTAG GCATAGGCTT GGTTATGCCG GTACTGCCGG GCCTCTTGCG 660 

GGATATCGTC CATTCCGACA GCATCGCCAG TCACTATGGC GTGCTGCTAG CGCTATATGC 720 

GTTGATGCAA TTTCTATGCG CACCCGTTCT CGGAG CACTG TCCGACCGCT TTGGCCGCCG 780 

CCCAGTCCTG CTCGCTTCGC TACTTGGAGC CACTATCGAC TACGCGATCA TGGCGACCAC 840 

ACCCGTCCTG TGGATTCTCT ACGCCGGACG CATCGTGGCC GGCATCACCG GCGCCACAGG 900 

TGCGGTTGCT GGCGCCTATA TCGCCGACAT CACCGATGGG GAAGATCGGG CTCGCCACTT 960 

CGGGCTCATG AGCGCTTGTT TCGGCGTGGG TATGGTGGCA GGCCCCGTGG CCGGGGGACT 1020 

GTTGGGCGCC ATCTCCTTGC ACGCACCATT CCTTGCGGCG GCGGTGCTCA ACGGCCTCAA 1080 

CCTACTACTG GGCTGCTTCC TAATGCAGGA GTCGCATAAG GGAGAGCGTC GTCCGATGCC 1140 

CTTGAGAGCC TTCAACCCAG TCAGCTCCTT CCGGTGGGCG CGGGGCATGA CTATCGTCGC 1200 

CGCACTTATG ACTGTCTTCT TTATCATGCA ACTCGTAGGA CAGGTGCCGG CAGCGCTCTG 1260 
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GGTCATTTTC GGCGAGGACC GCTTTCGCTG GAGCGCGACG ATGATCGGCC TGTCGCTTGC 1320 

GGTATTCGGA ATCTTGCACG CCCTCGCTCA AGCCTTCGTC ACTGGTCCCG CCACCAAACG 1380 

TTTCGGCGAG AAGCAGGCCA TTATCGCCGG CATGGCGGCC GACGCGCTGG GCTACGTCTT 1440 

GCTGGCGTTC GCGACGCGAG GCTGGATGGC CTTCCCCATT ATGATTCTTC TCGCTTCCGG 1500 

CGGCATCGGG ATGCCCGCGT TGCAGGCCAT GCTGTCCAGG CAGGTAGATG ACGACCATCA 1560 

GGGACAGCTT CAAGGATCGC TCGCGGCTCT TACCAGCCTA ACTTCGATCA CTGGACCGCT 1620 

GATCGTCACG GCGATTTATG CCGCCTCGGC GAGCACATGG AACGGGTTGG CATGGATTGT 1680 

AGGCGCCGCC CTATACCTTG TCTGCCTCCC CGCGTTGCGT CGCGGTGCAT GGAGCCGGGC 1740 

CACCTCGACC TGAATGGAAG CCGGCGGCAC CTCGCTAACG GATTCACCAC TCCAAGAATT 1800 

GGAGCCAATC AATTCTTGCG GAGAACTGTG AATGCGCAAA CCAACCCTTG GCAGAACATA 1860 

TCCATCGCGT CCGCCATCTC CAGCAGCCGC ACGCGGCGCA TCTCGGGGGA TGATCAGCTG 1920 

CCTCGCGCGT TTCGGTGATG ACGGTGAAAA CCTCTGACAC ATGCAGCTCC CGGAGACGGT 1980 

CACAGCTTGT CTGTAAGCGG ATGCCGGGAG CAGACAAGCC CGTCAGGGCG CGTCAGCGGG 2040 

TGTTGGCGGG TGTCGGGGCG CAGCCATGAC CCAGTCACGT AGCGATAGCG GAGTGTATAC 2100 

TGGCTTAACT ATGCGGCATC AGAGCAGATT GTACTGAGAG TGCACCATAT GCGGTGTGAA 2160 

ATACCGCACA GATGCGTAAG GAGAAAATAC CGCATCAGGC GCTCTTCCGC TTCCTCGCTC 2220 

ACTGACTCGC TGCGCTCGGT CGTTCGGCTG CGGCGAGCGG TATCAGCTCA CTCAAAGGCG 2280 

GTAATACGGT TATCCACAGA ATCAGGGGAT AACGCAGGAA AGAACATGTG AGCAAAAGGC 2340 

CAGCAAAAGG CCAGGAACCG TAAAAAGGCC GCGTTGCTGG CGTTTTTCCA TAGGCTCCGC 2400 

CCCCCTGACG AGCATCACAA AAATCGACGC TCAAGTCAGA GGTGGCGAAA GCCGACAGGA 2460 

CTATAAAGAT ACCAGGCGTT TCCCCCTGGA AGCTCCCTCG TGCGCTCTCC TGTTCCGACC 2520 

CTGCCGCTTA CCGGATACCT GTCCGCCTTT CTCCCTTCGG GAAGCGTGGC GCTTTCTCAA 2580 

TGCTCACGCT GTAGGTATCT CAGTTCGGTG TAGGTCGTTC GCTCCAAGCT GGGGTGTGTG 2640 

CACGAACCCC CCGTTCAGCC CGACCGCTGC GCCTTATCCG GTAACTATCG TCTTGAGTCC 2700 

AACCCGGTAA GACACGACTT ATCGCCACTG GCAGCAGCCA CTGGTAACAG GATTAGCAGA 2760 

GCGAGGTATG TAGGCGGTGC TACAGAGTTC TTGAAGTGGT GGCCTAACTA CGGCTACACT 2820 

AGAAGGACAG TATTTGGTAT CTGCGCTCTG CTGAAGCCAG TTACCTTCGG AAAAAGAGTT 2880 
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GGTAGCTCTT GATCCGGCAA ACAAACCACC GCTGGTAGCG GTGGTTTTTT TGTTTGCAAG 2940 

CAGCAGATTA CGCGCAGAAA AAAAGGATCT CAAGAAGATC CTTTGATCTT TTCTACGGGG 3000 

TCTGACGCTC AGTGGAACGA AAACTCACGT TAAGGGATTT TGGTCATGAG ATTATCAAAA 3060 

AGGATCTTCA CCTAGATCCT TTTCAGATCT CCCGATCTTT AGCTGTCTTG GTTTGCCCAA 3120 

AGCGCATTGC ATAATCTTTC AGGGTTATGC GTTGTTCCAT ACAACCTCCT TAGTACATGC 3180 

AACCATTATC ACCGCCAGAG GTAAAATAGT CAACACGCAC GGTGTTAGAT ATTTATCCCT 3240 

TGCGGTGATA GATTTAACGT ATGAGCACAA AAAAGAAACC ATTAACACAA GAGCAGCTTG 3300 

AGGACGCACG TCGCCTTAAA GCAATTTATG AAAAAAAGAA AAATGAACTT GGCTTATCCC 3360 

AGGAATCTGT CGCAGACAAG ATGGGGATGG GGCAGTCAGG CGTTGGTGCT TTATTTAATG 3420 

GCATCAATGC ATTAAATGCT TATAACGCCG CATTGCTTAC AAAAATTCTC AAAGTTAGCG 3480 

TTGAAGAATT TAGCCCTTCA ATCGCCAGAG AAATCTACGA GATGTATGAA GCGGTTAGTA 3540 

TGCAGCCGTC ACTTAGAAGT GAGTATGAGT ACCCTGTTTT TTCTCATGTT C AG G CAGGGA 3600 

TGTTCTCACC TAAGCTTAGA ACCTTTACCA AAGGTGATGC GGAGAGATGG GTAAGCACAA 3660 

CCAAAAAAGC CAGTGATTCT GCATTCTGGC TTGAGGTTGA AGGTAATTCC ATGACCGCAC 3720 

CAACAGGCTC CAAGCCAAGC TTTCCTGACG GAATGTTAAT TCTCGTTGAC CCTGAGCAGG 3780 

CTGTTGAGCC AGGTGATTTC TGCATAGCCA GACTTGGGGG TGATGAGTTT ACCTTCAAGA 3840 

AACTAATTAG GGATAGCGGT CAGGTGTTTT TACAACCACT AAACCCACAG TACCCAATGA 3900 

TCCCATGCAA TGAGAGTTGT TCCGTTGTGG GGAAAGTTAT CGCTAGTCAG TGGCCTGAAG 3960 

AGACGTTTGG CTGATCGGCA AGGTGTTCTG GTCGGCGCAT AGCTGATAAC AATTGAGCAA 4020 

GAATCTTCAT CGGGGCTGCA GCCCACGATG CGTCCGGCGT AGAGGATCTC TCACCTACCA 4080 

AACAATGCCC CCCTGCAAAA AATAAATTCA TATAAAAAAC ATACAGATAA CCATCTGCGG 4140 

TGATAAATTA TCTCTGGCGG TGTTGACATA AATACCACTG GCGGTGATAC TGAGCACATC 4200 

AGCAGGACGC ACTGACCACC ATGAAGGTGA CGCTCTTAAA ATTAAGCCCT GAAGAAGGGC 4260 

AGCATTCAAA GCAGAAGGCT TTGGGGTGTG TGATACGAAA CGAAGCATT 4309 
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(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6151 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: circular 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

GAATTAATTC CAGCTTGCTG TGGAATGTGT GTCAGTTAGG GTGTGGAAAG TCCCCAGGCT 60 

CCCCAGCAGG CAGAAGTATG CAAAGCATGC ATCTCAATTA GTCAGCAACC AGGTGTGGAA 120 

AGTCCCCAGG CTCCCCAGCA GGCAGAAGTA TGCAAAGCAT GCATCTCAAT TAGTCAGCAA 180 

CCATAGTCCC GCCCCTAACT CCGCCCATCC CGCCCCTAAC TCCGCCCAGT TCCGCCCATT 240 

CTCCGCCCCA TGGCTGACTA ATTTTTTTTA TTTATGCAGA GGCCGAGGCC GCCTCGGCCT 300 

CTGAGCTATT CCAGAAGTAG TGAGGAGGCT TTTTTGGAGG GGTCCTCCTC GTATAGAAAC 360 

TCGGACCACT CTGAGACGAA GGCTCGCGTC CAGGCCAGCA CGAAGGAGGC TAAGTGGGAG 420 

GGGTAGCGGT CGTTGTCCAC TAGGGGGTCC ACTCGCTCCA GGGTGTGAAG ACACATGTCG 480 

CCCTCTTCGG CATCAAGGAA GGTGATTGGT TTATAGGTGT AGGCCACGTG ACCGGGTGTT 540 

CCTGAAGGGG GGCTATAAAA GGGGGTGGGG GCGCGTTCGT CCTCACTCTC TTCCGCATCG 600 

CTGTCTGCGA GGGCCAGCTG TTGGGCTCGC GGTTGAGGAC AAACTCTTCG CGGTCTTTCC 660 

AGTACTCTTG GATCGGAAAC CCGTCGGCCT CCGAACGGTA CTCCGCCACC GAGGGACCTG 720 

AGCGAGTCCG CATCGACCGG ATCGGAAAAC CTCTCGAGAA AGGCGTCTAA CCAGTCACAG 780 

TCGCAAGGTA GGCTGAGCAC CGTGGCGGGC GGCAGCGGGT GGCGGTCGGG GTTGTTTCTG 840 

GCGGAGGTGC TGCTGATGAT GTAATTAAAG TAGGCGGTCT TGAGACGGCG GATGGTCGAG 900 

GTGAGGTGTG GCAGGCTTGA GATCGATCTG GCCATACACT TGAGTGACAA TGACATCCAC 960 

TTTGCCTTTC TCTCCACAGG TGTCCACTCC CAGGTCCAAC TGGATCCAAG CTTCGACTCG 1020 

AGGAATTCCC CGAAGGAACA AAGCACCCTC CCCACTGGGC TCCTGGTTGC AGAGCTCCAA 1080 

GTCCTCACAC AGATACGCCT GTTTGAGAAG CAGCGGGCAA GAAAGACGCA AGCCCAGAGG 1140 

CCCTGCCATT TCTGTGGGCT CAGGTCCCTA CTGGCTCAGG CCCCTGCCTC CCTCGGCAAG 1200 

GCCACAATGA ACCGGGGAGT CCCTTTTAGG CACTTGCTTC TGGTGCTGCA ACTGGCGCTC 1260 
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CTCCCAGCAG CCACTCAGGG AAAGAAAGTG GTGCTGGGCA AAAAAGGGGA TACAGTGGAA 1320 

CTGACCTGTA CAGCTTCCCA GAAGAAGAGC ATACAATTCC ACTGGAAAAA CTCCAACCAG 1380 

ATAAAGATTC TGGGAAATCA GGGCTCCTTC TTAACTAAAG GTCCATCCAA GCTGAATGAT 1440 

CGCGCTGACT CAAGAAGAAG CTTGTGGGAC CAAGGAAACT TTCCCCTGAT CATCAAGAAT 1500 

CTTAAGATAG AAGACTCAGA TACTTACATC TGTGAAGTGG AGGACCAGAA GGAGGAGGTG 1560 

CAATTGCTAG TGTTCGGATT GACTGCCAAC TCTGACACCC ACCTGCTTCA GGGGCAGAGC 1620 

CTGACCCTGA CCTTGGAGAG CCCCCCTGGT AGTAGCCCCT CAGTGCAATG TAGGAGTCCA 1680 

AGGGGTAAAA ACATACAGGG GGGGAAGACC CTCTCCGTGT CTCAGCTGGA GCTCCAGGAT 1740 

AGTGGCACCT GGACATGCAC TGTCTTGCAG AACCAGAAGA AGGTGGAGTT CAAAATAGAC 1800 

ATCGTGGTGC TAGCTTTCCA GAAGGCCTCC AGCATAGTCT ATAAGAAAGA GGGGGAACAG 1860 

GTGGAGTTCT CCTTCCCACT CGCCTTTACA GTTGAAAAGC TGACGGGCAG TGGCGAGCTG 1920 

TGGTGGCAGG CGGAGAGGGC TTCCTCCTCC AAGTCTTGGA TCACCTTTGA CCTGAAGAAC 1980 

AAGGAAGTGT CTGTAAAACG GGTTACCCAG GACCCTAAGC TCCAGATGGG CAAGAAGCTC 2040 

CCGCTCCACC TCACCCTGCC CCAGGCCTTG CCTCAGTATG CTGGCTCTGG AAACCTCACC 2100 

CTGGCCCTTG AAGCGAAAAC AGGAAAGTTG CATCAGGAAG TGAACCTGGT GGTGATGAGA 2160 

GCCACTCAGC TCCAGAAAAA TTTGACCTGT GAGGTGTGGG GACCCACCTC CCGTAAGCTG 2220 

ATGCTGAGTT TGAAACTGGA GAACAAGGAG GCAAAGGTCT CGAAGCGGGA GAAGG CGGTG 2280 

TGGGTGCTGA ACCCTGAGGC GGGGATGTGG CAGTGTCTGC TGAGTGACTC GGGACAGGTC 2340 

CTGCTGGAAT CCAACATCAA GGTTCTGCCC ACATGGTCGA CCCCGGTGCA GCCAATGGCC 2400 

CTGATTTGAG ATCTTTGTGA AGGAACCTTA CTTCTGTGGT GTGACATAAT TGGACAAACT 2460 

ACCTACAGAG ATTTAAAGCT CTAAGGTAAA TATAAAATTT TTAAGTGTAT AATGTGTTAA 2520 

ACTACTGATT CTAATTGTTT GTGTATTTTA GATTCCAACC TATGGAACTG ATGAATGGGA 2580 

GCAGTGGTGG AATGCCTTTA ATGAGGAAAA CCTGTTTTGC TCAGAAGAAA TGCCATCTAG 2640 

TGATGATGAG GCTACTGCTG ACTCTCAACA TTCTACTCCT CCAAAAAAGA AGAGAAAGGT 2700 

AGAAGACCCC AAGGACTTTC CTTCAGAATT GCTAAGTTTT TTGAGTCATG CTGTGTTTAG 2760 

TAATAGAACT CTTGCTTGCT TTGCTATTTA CACCACAAAG GAAAAAGCTG CACTGCTATA 2820 

CAAGAAAATT ATGGAAAAAT ATTCTGTAAC CTTTATAAGT AGGCATAACA GTTATAATCA 2880 
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TAACATACTG TTTTTTCTTA CTCCACACAG GCATAGAGTG TCTGCTATTA ATAACTATGC 2940 

TCAAAAATTG TGTACCTTTA GCTTTTTAAT TTGTAAAGGG GTTAATAAGG AATATTTGAT 3000 

GTATAGTGCC TTGACTAGAG ATCATAATCA GCCATACCAC ATTTGTAGAG GTTTTACTTG 3060 

CTTTAAAAAA CCTCCCACAC CTCCCCCTGA ACCTGAAACA TAAAATGAAT GCAATTGTTG 3120 

TTGTTAACTT GTTTATTGCA GCTTATAATG GTTACAAATA AAGCAATAGC ATCACAAATT 3180 

TCACAAATAA AGCATTTTTT TCACTGCATT CTAGTTGTGG TTTGTCCAAA CTCATCAATG 3240 

TATCTTATCA TGTCTGGATC CTCTACGCCG GACGCATCGT GGCCGGCATC ACCGGCGCCA 3300 

CAGGTGCGGT TGCTGGCGCC TATATCGCCG ACATCACCGA TGGGGAAGAT CGGGCTCGCC 3360 

ACT TCGGGCT CATGAGCGCT TGTTTCGGCG TGGGTATGGT GGCAGGCCCG TGGCCGGGGG 3420 

ACTGTTGGGC GCCATCTCCT TGCATGCACC ATTCCTTGCG GCGGCGGTGC TCAACGGCCT 3480 

CAACCTACTA CTGGGCTGCT TCCTAATGCA GGAGTCGCAT AAGGGAGAGC GTCGACCGAT 3540 

GCCCTTGAGA GCCTTCAACC CAGTCAGCTC CTTCCGGTGG GCGCGGGGCA TGACTATCGT 3600 

CGCCGCACTT ATGACTGTCT TCTTTATCAT G CAACTCGT A GGACAGGTGC CGGCAGCGCT 3660 

CTGGGTCATT TTCGGCGAGG ACCGCTTTCG CTGGAGCGCG ACGATGATCG GCCTGTCGCT 3720 

TGCGGTATTC GGAATCTTGC ACGCCCTCGC TCAAGCCTTC GTCACTGGTC CCGCCACCAA 3780 

ACGTTTCGGC GAGAAGCAGG CCATTATCGC CGGCATGGCG GCCGACGCGC TGGGCTACGT 3840 

CTTGCTGGCG TTCGCGACGC GAGGCTGGAT GGCCTTCCCC ATTATGATTC TTCTCGCTTC 3900 

CGGCGGCATC GGGATGCCCG CGTTGCAGGC CATGCTGTCC AGGCAGGTAG ATGACGACCA 3960 

TCAGGGACAG CTTCAAGGAT CGCTCGCGGC TCTTACCAGC CTAACTTCGA TCACTGGACC 4020 

GCTGATCGTC ACGGCGATTT ATGCCGCCTC GGCGAGCACA TGGAACGGGT TGGCATGGAT 4080 

TGTAGGCGCC GCCCTATACC TTGTCTGCCT CCCCGCGTTG CGTCGCGGTG CATGGAGCCG 4140 

GGCCACCTCG ACCTGAATGG AAGCCGGCGG CACCTCGCTA ACGGATTCAC CACTCCAAGA 4200 

ATTGGAGCCA ATCAATTCTT GCGGAGAACT GTGAATGCGC AAACCAACCC TTGGCAGAAC 4260 

ATATCCATCG CGTCCGCCAT CTCCAGCAGC CGCACGCGGC GCATCTCGGG CCGCGTTGCT 4320 

GGCGTTTTTC CATAGGCTCC GCCCCCCTGA CGAGC AT C AC AAAAATCGAC GCTCAAGTCA 4380 

GAGGTGGCGA AACCCGACAG GACTATAAAG ATACCAGGCG TTTCCCCCTG GAAGCTCCCT 4440 

CGTGCGCTCT CCTGTTCCGA CCCTGCCGCT TACCGGATAC CTGTCCGCCT TTCTCCCTTC 4500 
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GGGAAGCGTG GCGCTTTCTC AATGCTCACG CTGTAGGTAT CTCAGTTCGG TGTAGGTCGT 4560 

TCGCTCCAAG CTGGGCTGTG TGCACGAACC CCCCGTTCAG CCCGACCGCT GCGCCTTATC 4620 

CGGTAACTAT CGTCTTGAGT CCAACCCGGT AAGACACGAC TTATCGCCAC TGGCAGCAGC 4680 

CACTGGTAAC AGGATTAGCA GAGCGAGGTA TGTAGGCGGT GCTACAGAGT TCTTGAAGTG 4740 

GTGGCCTAAC TACGGCTACA CTAGAAGGAC AGTATTTGGT ATCTGCGCTC TGCTGAAGCC 4800 

AGTTACCTTC GGAAAAAGAG TTGGTAGCTC TTGATCCGGC AAACAAACCA CCGCTGGTAG 4860 

CGGTGGTTTT TTTGTTTGCA AGCAGCAGAT TACGCGCAGA AAAAAAGGAT CTCAAGAAGA 4920 

TCCTTTGATC TTTTCTACGG GGTCTGACGC TCAGTGGAAC GAAAACTCAC GTTAAGGGAT 4980 

TTTGGTCATG AGATTATCAA AAAGGATCTT CACCTAGATC CTTTTAAATT AAAAATGAAG 5040 

TTTTAAATCA ATCTAAAGTA TATATGAGTA AACTTGGTCT GACAGTTACC AATGCTTAAT 5100 

CAGTGAGGCA CGTATCTGAG CGATCTGTCT ATTTCGTTCA TCCATAGTTG CCTGACTCCC 5160 

CGTCGTGTAG ATAACTACGA TACGGGAGGG CTTACC AT CT GGCCCCAGTG CTGCAATGAT 5220 

ACCGCGAGAC CCACGCTCAC CGGCTCCAGA TTTATCAGCA ATAAACCAGC CAGCCGGAAG 5280 

GGCCGAGCGC AGAAGTGGTC CTGCAACTTT ATCCGCCTCC ATCCAGTCTA TTAATTGTTG 5340 

CCGGGAAGCT AGAGTAAGTA GTTCGCCAGT TAATAGTTTG CGCAACGTTG TTGCCATTGC 5400 

TGCAGGCATC GTGGTGTCAC GCTCGTCGTT TGGTATGGCT TCATTCAGCT CCGGTTCCCA 5460 

ACGATCAAGG CGAGTTACAT GATCCCCCAT GTTGTGCAAA AAAGCGGTTA GCTCCTTCGG 5520 

TCCTCCGATC GTTGTCAGAA GTAAGTTGGC CGCAGTGTTA TCACTCATGG TTATGGCAGC 5580 

ACTGCATAAT TCTCTTACTG TCATGCCATC CGTAAGATGC TTTTCTGTGA CTGGTGAGTA 5640 

CTCAACCAAG TCATTCTGAG AATAGTGTAT GCGGCGACCG AGTTGCTCTT GCCCGGCGTC 5700 

AACACGGGAT AATACCGCGC CACATAGCAG AACTTTAAAA GTGCTCATCA TTGGAAAACG 5760 

TTCTTCGGGG CGAAAACTCT CAAGGATCTT ACCGCTGTTG AGATCCAGTT CGATGTAACC 5820 

CACTCGTGCA CCCAACTGAT CTTCAGCATC TTTTACTTTC ACCAGCGTTT CTGGGTGAGC 5880 

AAAAACAGGA AGGCAAAATG CCGCAAAAAA GGGAATAAGG GCGACACGGA AATGTTGAAT 5940 

ACTCATACTC TTCCTTTTTC AATATTATTG AAGCATTTAT CAGGGTTATT GTCTCATGAG 6000 

CGGATACATA TTTGAATGTA TTTAGAAAAA TAAACAAATA GGGGTTCCGC GCACATTTCC 6060 

CCGAAAAGTG CCACCTGACG TCTAAGAAAC CATTATTATC ATGACATTAA CCTATAAAAA 6120 
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TAGGCGTATC ACGAGGCCCT TTCGTCTTCA A 6151 



WO 91/17170 



PCT/US91/02954 



- 72 - 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5727 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : double 
(D) TOPOLOGY: circular 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

GAATTCTTAC ACTTAGTTAA ATTGCTAACT TTATAGATTA CAAAACTTAG GAAATCGATT 60 

TGGATGAAAA AAGTAGTACT GGGCAAAAAA GGGGATACAG TGGAACTGAC CTGTACAGCT 120 

TCCCAGAAGA AGAGCATACA ATTCCACTGG AAAAACTCCA ACCAGATAAA GATTCTGGGA 180 

AATCAGGGCT CCTTCTTAAC TAAAGGTCCA TCCAAGCTGA ATGATCGCGC TGACTCAAGA 240 

AGAAGCTTGT GGGACCAAGG AAACTTTCCC CTGATCATCA AGAATCTTAA GATAGAAGAC 300 

TCAGATACTT ACATCTGTGA AGTGGAGGAC CAGAAGGAGG AGGTGCAATT GCTAGTGTTC 360 

GGATTGACTG CCAACTCTGA CACCCACCTG CTTCAGGGGC AGAGCCTGAC CCTGACCTTG 420 

GAGAGCCCCC CTGGTAGTAG CCCCTCAGTG CAATGTAGGA GTCCAAGGGG TAAAAACATA 480 

CAGGGGGGGA AGACCCTCTC CGTGTCTCAG CTGGAGCTCC AGGATAGTGG CACCTGGACA 540 

TGCACTGTCT TGCAGAACCA GAAGAAGGTG GAGTTCAAAA TAGACATCGT GGTGCTAGCT 600 

TTCCAGAAGG GGAAGATCTT TCCCGAGGGC GGCAGCCTGG CCGCGCTGAC CGCGCACCAG 660 

GCTTGCCACC TGCCGCTGGA GACTTTCACC CGTCATCGCC AGCCGCGCGG CTGGGAACAA 720 

CTGGAGCAGT GCGGCTATCC GGTGCAGCGG CTGGTCGCCC TCTACCTGGC GGCGCGGCTG 780 

TCGTGGAACC AGGTCGACCA GGTGATCCGC AACGCCCTGG CCAGCCCCGG CAGCGGCGGC 840 

GACCTGGGCG AAGCGATCCG CGAGCAGCCG GAGCAGGCCC GTCTGGCCCT GACCCTGGCC 900 

GCCGCCGAGA GCGAGCGCTT CGTCCGGCAG GGCACCGGCA ACGACGAGGC CGGCGCGGCC 960 

AACGCCGACG TGGTGAGCCT GACCTGCCCG GTCGCCGCCG GTGAATGCGC GGGCCCGGCG 1020 

GACAGCGGCG ACGCCCTGCT GGAGCGCAAC TATCCCACTG GCGCGGAGTT CCTCGGCGAC 1080 

GGCGGCGACG TCAGCTTCAG CACCCGCGGC ACGCAGAACT GGACGGTGGA GCGGCTGCTC 1140 

CAGGCGCACC GCCAACTGGA GGAGCGCGGC TATGTGTTCG TCGGCTACCA CGGCACCTTC 1200 

CTCGAAGCGG CGCAAAGCAT CGTCTTCGGC GGGGTGCGCG CGCGCAGCCA GGACCTCGAC 1260 
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6C6ATCTGGC GCGGTTTCTA TATCGCCGGC GATCCGGCGC TGGCCTACGG CTACGCCCAG 1320 

GACCAGGAAC CCGACGCACG CGGCCGGATC CGCAACGGTG CCCTGCTGCG GGTCTATGTG 1380 

CCGCGCTCGA GCCTGCCGGG CTTCTACCGC ACCAGCCTGA CCCTGGCCGC GCCGGAGGCG 1440 

GCGGGCGAGG TCGAACGGCT GATCGGCCAT CCGCTGCCGC TGCGCCTGGA CGCCATCACC 1500 

GGCCCCGAGG AGGAAGGCGG GCGCCTGGAG ACCATTCTCG GCTGGCCGCT GGCCGAGCGC 1560 

ACCGTGGTGA TTCCCTCGGC GATCCCCACC GACCCGCGCA ACGTCGGCGG CGACCTCGAC 1620 

CCGTCCAGCA TCCCCGACAA GGAACAGGCG ATCAGCGCCC TGCCGGACTA CGCCAGCCAG 1680 

CCCGGCAAAC CGCCGCGCGA GGACCTGAAG TAACTGCCGC GACCGGCCGG CTCCCTTCGC 1740 

AGGAGCCGGC CTTCTCGGGG CCTGGCCATA CATCAGGTTT TCCTGATGCC AGCCCAATCG 1800 

AATATGAATT CTCATCGATT TCCATGGGAT CCTGCAGCCC AGCTTGGGGA CCCTAGAGGT 1860 

CCCCTTTTTT ATTTTTTGAA TTGGGAGATC CAATTCTCAT GTTTGACAGC TTATCATCGA 1920 

AGCTAGCTTT AATGCGGTAG TTTATCACAG TTAAATTGCT AACGCAGTCA GGCACCGTGT 1980 

ATGAAATCTA ACAATGCGCT CATCGTCATC CTCGGCACCG TCACCCTGGA TGCTGTAGGC 2040 

ATAGGCTTGG TTATGCCGGT ACTGCCGGGC CTCTTGCGGG ATATCGTCCA TTCCGACAGC 2100 

ATCGCCAGTC ACTATGGCGT GCTGCTAGCG CTATATGCGT TGATGCAATT TCTATGCGCA 2160 

CCCGTTCTCG GAGCACTGTC CGACCGCTTT GGCCGCCGCC CAGTCCTGCT CGCTTCGCTA 2220 

CTTGGAGCCA CTATCGACTA CGCGATCATG GCGACCACAC CCGTCCTGTG GATTCTCTAC 2280 

GCCGGACGCA TCGTGGCCGG CATCACCGGC GCCACAGGTG CGGTTGCTGG CGCCTATATC 2340 

GCCGACATCA CCGATGGGGA AGATCGGGCT CGCCACTTCG GGCTCATGAG CGCTTGTTTC 2400 

GGCGTGGGTA TGGTGGCAGG CCCCGTGGCC GGGGGACTGT TGGGCGCCAT CTCCTTGCAC 2460 

GCACCATTCC TTGCGGCGGC GGTGCTCAAC GGCCTCAACC TACTACTGGG CTGCTTCCTA 2520 

ATGCAGGAGT CGCATAAGGG AGAGCGTCGT CCGATGCCCT TGAGAGCCTT CAACCCAGTC 2580 

AGCTCCTTCC GGTGGGCGCG GGGCATGACT ATCGTCGCCG CACTTATGAC TGTCTTCTTT 2640 

ATCATGCAAC TCGTAGGACA GGTGCCGGCA GCGCTCTGGG TCATTTTCGG CGAGGACCGC 2700 

TTTCGCTGGA GCGCGACGAT GATCGGCCTG TCGCTTGCGG TATTCGGAAT CTTGCACGCC 2760 

CTCGCTCAAG CCTTCGTCAC TGGTCCCGCC ACCAAACGTT TCGGCGAGAA GCAGGCCATT 2820 

ATCGCCGGCA TGGCGGCCGA CGCGCTGGGC TACGTCTTGC TGGCGTTCGC GACGCGAGGC 2880 



WO 91/17170 



PCI7US91/02954 



- 74 - 

TGGATGGCCT TCCCCATTAT GATTCTTCTC GCTTCCGGCG GCATCGGGAT GCCCGCGTTG 2940 

CAGGCCATGC TGTCCAGGCA GGTAGATGAC GACCATCAGG GACAGCTTCA AGGATCGCTC 3000 

GCGGCTCTTA CCAGCCTAAC TTCGATCACT GGACCGCTGA TCGTCACGGC GATTTATGCC 3060 

GCCTCGGCGA GCACATGGAA CGGGTTGGCA TGGATTGTAG GCGCCGCCCT ATACCTTGTC 3120 

TGCCTCCCCG CGTTGCGTCG CGGTGCATGG AGCCGGGCCA CCTCGACCTG AATGGAAGCC 3180 

GGCGGCACCT CGCTAACGGA TTCACCACTC CAAGAATTGG AGCCAATCAA TTCTTGCGGA 3240 

GAACTGTGAA TGCGCAAACC AACCCTTGGC AGAACATATC CATCGCGTCC GCCATCTCCA 3300 

GCAGCCGCAC GCGGCGCATC TCGGGGGATG ATCAGCTGCC TCGCGCGTTT CGGTGATGAC 3360 

GGTGAAAACC TCTGACACAT GCAGCTCCCG GAGACGGTCA CAGCTTGTCT GTAAGCGGAT 3420 

GCCGGGAGCA GACAAGCCCG TCAGGGCGCG TCAGCGGGTG TTGGCGGGTG TCGGGGCGCA 3480 

GCCATGACCC AGTCACGTAG CGATAGCGGA GTGTATACTG GCTTAACTAT GCGGCATCAG 3540 

AGCAGATTGT ACTGAGAGTG CACCATATGC GGTGTGAAAT ACCGCACAGA TGCGTAAGGA 3600 

GAAAATACCG CATCAGGCGC TCTTCCGCTT CCTCGCTCAC TGACTCGCTG CGCTCGGTCG 3660 

TTCGGCTGCG GCGAGCGGTA TCAGCTCACT CAAAGGCGGT AATACGGTTA TCCACAGAAT 3720 

CAGGGGATAA CGCAGGAAAG AACATGTGAG CAAAAGGCCA GCAAAAGGCC AGGAACCGTA 3780 

AAAAGGCCGC GTTGCTGGCG TTTTTCCATA GGCTCCGCCC CCCTGACGAG CATCACAAAA 3840 

ATCGACGCTC AAGTCAGAGG TGGCGAAACC CGACAGGACT ATAAAGATAC CAGGCGTTTC 3900 

CCCCTGGAAG CTCCCTCGTG CGCTCTCCTG TTCCGACCCT GCCGCTTACC GGATACCTGT 3960 

CCGCCTTTCT CCCTTCGGGA AGCGTGGCGC TTTCTCAATG CTCACGCTGT AGGTATCTCA 4020 

GTTCGGTGTA GGTCGTTCGC TCCAAGCTGG GCTGTGTGCA CGAACCCCCC GTTCAGCCCG 4080 

ACCGCTGCGC CTTATCCGGT AACTATCGTC TTGAGTCCAA CCCGGTAAGA CACGACTTAT 4140 

CGCCACTGGC AGCAGCCACT GGTAACAGGA TTAGCAGAGC GAGGTATGTA GGCGGTGCTA 4200 

CAGAGTTCTT GAAGTGGTGG CCTAACTACG G CT AC ACT AG AAGGACAGTA TTTGGTATCT 4260 

GCGCTCTGCT GAAGCCAGTT ACCTTCGGAA AAAGAGTTGG TAGCTCTTGA TCCGGCAAAC 4320 

AAACCACCGC TGGTAGCGGT GGTTTTTTTG TTTGCAAGCA GCAGATTACG CGCAGAAAAA 4380 

AAGGATCTCA AGAAGATCCT TTGATCTTTT CTACGGGGTC TGACGCTCAG TGGAACGAAA 4440 

ACTCACGTTA AGGGATTTTG GTCATGAGAT TATCAAAAAG GATCTTCACC TAGATCCTTT 4500 
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TCAGATCTCC CGATCTTTAG CTGTCTTGGT TTGCCCAAAG CGCATTGCAT AATCTTTCAG 4560 

GGTTATGCGT TGTTCCATAC AACCTCCTTA GTACATGCAA CCATTATCAC CGCCAGAGGT 4620 

AAAATAGTCA ACACGCACGG TGTTAGATAT TTATCCCTTG CGGTGATAGA TTTAACGTAT 4680 

GAGCACAAAA AAGAAACCAT TAACACAAGA GCAGCTTGAG GACGCACGTC GCCTTAAAGC 4740 

AATTTATGAA AAAAAGAAAA ATGAACTTGG CTTATCCCAG GAATCTGTCG CAGACAAGAT 4800 

GGGGATGGGG CAGTCAGGCG TTGGTGCTTT ATTTAATGGC ATCAATGCAT TAAATGCTTA 4860 

TAACGCCGCA TTGCTTACAA AAATTCTCAA AGTTAGCGTT GAAGAATTTA GCCCTTCAAT 4920 

CGCCAGAGAA AT CTACG AGA TGTATGAAGC GGTTAGTATG CAGCCGTCAC TTAGAAGTGA 4980 

GTATGAGTAC CCTGTTTTTT CTCATGTTCA GGCAGGGATG TTCTCACCTA AGCTTAGAAC 5040 

CTTTACCAAA GGTGATGCGG AGAGATGGGT AAGCACAACC AAAAAAGCCA GTGATTCTGC 5100 

ATTCTGGCTT GAGGTTGAAG GTAATTCCAT GACCGCACCA ACAGGCTCCA AGCCAAGCTT 5160 

TCCTGACGGA ATGTTAATTC TCGTTGACCC TGAGCAGGCT GTTGAGCCAG GTGATTTCTG 5220 

CATAGCCAGA CTTGGGGGTG ATGAGTTTAC CTTCAAGAAA CTAATTAGGG ATAGCGGTCA 5280 

GGTGTTTTTA CAACCACTAA ACCCAGAGTA CCCAATGATC CCATGCAATG AGAGTTGTTC 5340 

CGTTGTGGGG AAAGTTATCG CTAGTCAGTG GCCTGAAGAG ACGTTTGGCT GATCGGCAAG 5400 

GTGTTCTGGT CGGCGCATAG CTGATAACAA TTGAGCAAGA ATCTTCATCG GGGCTGCAGC 5460 

CCACGATGCG TCCGGCGTAG AGGATCTCTC ACCTACCAAA CAATGCCCCC CTGCAAAAAA 5520 

TAAATTCATA TAAAAAAGAT ACAGATAACC ATCTGCGGTG ATAAATTATC TCTGGCGGTG 5580 

TTGACATAAA TACCACTGGC GGTGATACTG AGCACATCAG CAGGACGCAC TGACCACCAT 5640 

GAAGGTGACG CTCTTAAAAT TAAGCCCTGA AGAAGGGCAG CATTCAAAGC AGAAGGCTTT 5700 

GGGGTGTGTG ATACGAAACG AAGCATT 5727 
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(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Gly Tyr Gly Lys His Val Val Pro Asn Glu Val Val Val Gin Arg Leu 
15 10 15 

Phe Gin Val Lys Gly Arg Arg 
20 
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CLAIMS 

We claim: 

1. A recombinant DNA molecule comprising a 
DNA sequence encoding a gelsolin fusion polypeptide 
comprising a first DNA sequence encoding a polypeptide 
moiety and a second DNA sequence comprising a gelsolin 
moiety. 

2. The recombinant DNA molecule according 
to claim 1, wherein the gelsolin moiety is derived from 
human plasma gelsolin. 

3. The recombinant DNA molecule according 
to claim 2, wherein the gelsolin moiety comprises amino 
acids +1 to +169 of Figure 1 (SEQ ID NO:l). 

4. The recombinant DNA molecule according 
to claim 3, wherein the gelsolin moiety comprises amino 
acids +150 to +169 of Figure 1 (SEQ ID NO:l). 

5. The recombinant DNA molecule according 
to claim 1, wherein the polypeptide moiety is selected 
from the group consisting of viral receptors, cell 
receptors, cell ligands, bacterial immunogens, 
parasitic immunogens, viral immunogens, immunoglobulins 
or fragments thereof that bind to target molecules, 
enzymes, enzyme inhibitors, enzyme substrates, 
cytokines, growth factors, colony stimulating factors, 
hormones and toxins. 



6. The recombinant DNA molecule according 
to claim 5, wherein the polypeptide moiety is a soluble 
CD 4 protein. 
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7. The recombinant DNA molecule according 
to claim 6 , wherein the soluble CD4 protein is selected 
from the group consisting of CD4(111), CD4(lllCys), 
CD4(180cys), CD4C181), CD4(183), CD4(187), CD4(345) and 
CD4(375) . 

8. The recombinant DNA molecule according 
to claim 7 which is pCD4-gelsolin. 

9 . The recombinant DNA molecule according 
to claim 5, wherein the polypeptide moiety is a cell 
receptor or a cell ligand selected from the group 
consisting of ICAM1, ELAM1, VCAM1, VCAMlb, LFA3 , CDX 
and VLA4. 

10. The recombinant DNA molecule according 
to claim 1, wherein the DNA sequence encoding a 
gelsolin fusion polypeptide is operatively linked to an 
expression control sequence. 

11. The recombinant DNA molecule according 
to claim 10, wherein the expression control sequence is 
selected from the group consisting of the early and 
late promoters of SV40 or adenovirus, the lac system, 
the trp system, the TAC or TRC system, the major 
operator and promoter regions of phage A, the control 
regions of fd coat protein, the promoter for 3- 
phosphoglycerate kinase or other glycolytic enzymes, 
the promoters of acid phosphatase, the promoters of the 
yeast a-mating factors, the polyhedron promoter of the 
baculovirus system and other sequences known to control 
the expression of genes of prokaryotic or eukaryotic 
cells or their viruses, and various combinations 
thereof . 
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12. A recombinant DNA molecule comprising a 
DNA sequence encoding a lipid binding protein fusion 
polypeptide comprising a first DNA sequence encoding a 
polypeptide moiety and a second DNA sequence encoding a 
lipid binding protein moiety. 

13 . The recombinant DNA molecule according 
to claim 12, wherein the lipid binding protein moiety 
is selected from the group consisting of protein kinase 
C, lipocortin, severin, villin, fragmin, profilin, 
cofilin, Cap42(a), gCap39 / Cap2, destrin and DNase I. 

14. A unicellular host transformed with a 
recombinant DNA molecule according to claim 1 or 12. 

15. The unicellular host according to 
claim 14, selected from the group consisting of E.coli , 
Pseudomonas , Bacillus , Streptomvces , fungi, such as 
yeasts, and animal cells, such as CHO and mouse cells, 
African green monkey cells, such as COS-1, COS-7, 

BSC 1, BSC 40 , and BMT 10, insect cells, and human 
cells and plant cells in tissue culture. 

16. The unicellular host according to 
claim 15, said host being a COS-7 cell or a CHO cell. 

17. A lipid binding protein fusion 
polypeptide comprising a functional moiety and a lipid 
binding protein moiety. 

18. The lipid binding protein fusion protein 
according to claim 17, wherein the lipid binding 
protein is selected from the group consisting of 
villin, severin, fragmin, profilin, cofin, Cap42 (a) , 
gCap39, Cap2 and destrin. 
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19. The lipid binding protein fusion 
polypeptide according to claim 17, wherein the lipid 
binding protein is selected from the group consisting 
of protein kinase C, lipocortin and DNase I. 

20. A gelsolin fusion polypeptide comprising 
a functional moiety and a gelsolin moiety. 

21. The gelsolin fusion polypeptide 
according to claim 20, wherein the functional moiety is 
a polypeptide moiety. 

22. The gelsolin fusion polypeptide 
according to claim 20, wherein said functional moiety 
is selected from the group consisting of viral 
receptors, cell receptors, cell ligands, bacterial 
immunogens, parasitic immunogens, viral immunogens, 
immunoglobulins or fragments of them that bind to 
target molecules, enzymes, enzyme inhibitors, enzyme 
substrates, cytokines, growth factors, colony 
stimulating factors, hormones and toxins. 

23. The gelsolin fusion polypeptide 
according to claim 22, wherein said functional moiety 
is a soluble CD4 protein. 

24. The gelsolin fusion polypeptide 
according to claim 23, wherein the soluble CD4 protein 
is selected from the group consisting of CD4(111), 
CD4(lllcys) CD4(180cys) CD4(181), CD4(183), CD4(187), 
CD4(345), CD4(375), CD4 (Cystamine) , CD4 (Cysteine) and 
CD4 (Glutathione) . 

25. The gelsolin fusion polypeptide 
according to claim 22, wherein said functional moiety 
is a cell receptor or a cell ligand selected from the 
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group consisting of ICAM1, ELAM1, VCAM1, VCAMlb, LFA3 , 
CDX and VLA4 . 

26. The gelsolin fusion polypeptide 
according to claim 21, wherein the C-terminus of the 
polypeptide moiety is fused to the N-terminus of the 
gelsolin moiety. 

27. The gelsolin fusion polypeptide 
according to claim 21, wherein the polypeptide moiety 
is chemically coupled to the gelsolin moiety. 

28. The gelsolin fusion polypeptide 
according to claim 27, wherein the polypeptide moiety 
is chemically coupled to the gelsolin moiety through an 
aldehyde-amine linkage. 

29. The gelsolin fusion polypeptide 
according to claim 27, wherein the polypeptide moiety 
is chemically coupled to the gelsolin moiety through a 
thiol group. 

30. The gelsolin fusion polypeptide 
according to claim 27, wherein the polypeptide moiety 
comprises an amino-terminal or carboxy-terminal 
cysteine. 

31. The gelsolin fusion polypeptide 
according to claim 20, wherein said functional moiety 
is selected from the group consisting of toxins, anti- 
retroviral agents, enzyme substrates and enzyme 
inhibitors. 

32. The gelsolin fusion polypeptide 
according to claim 31, wherein the functional moiety is 
AZT. 



WO 91/17170 



PCT/US91/02954 



- 82 - 

33. The gelsolin fusion polypeptide 
according to claim 20, comprising a reporter group 
selected from the group consisting of enzymes, 
radionuclides, fluorescent markers and chemiluminescent 
markers • 

34. A gelsolin fusion construct comprising a 
gelsolin fusion polypeptide and a vesicle comprising a 
polyphosphoinositide, said construct being multimeric 
or hetero-multimeric. 

35. The gelsolin fusion construct according 
to claim 34, wherein the polyphosphoinositide is PIP or 

pip 2 . 

36. The gelsolin fusion construct according 
to claim 35, said construct comprising a CD4-gelsolin 
fusion polypeptide. 

37. The gelsolin fusion construct according 
to claim 36, wherein said CD4-gelsolin fusion 
polypeptide is CD4 (181) -gelsolin fusion polypeptide. 

38. The gelsolin fusion construct according 
to claim 34, selected from the group consisting of 
ELAMl-gelsolin fusion polypeptides, VCAMl-gelsolin 
fusion polypeptides, VCAMlb-gelsolin fusion 
polypeptides, ICAMl-gelsolin fusion polypeptides, CDX- 
gelsolin fusion polypeptides, VLA4-gelsolin fusion 
polypeptides and LFA3 -gelsolin fusion polypeptides. 

39. The hetero-multimeric gelsolin fusion 
construct according to claim 34, said construct 
comprising a first functional moiety selected from the 
group consisting of viral receptors, cell receptors and 
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cell ligands, and a second functional moiety selected 
from the group consisting of toxins and anti- 
retroviral agents. 

40. The hetero-multimeric gelsolin fusion 
construct according to claim 34, said construct 
comprising a recognition molecule and a reporter group. 

41. The hetero-multimeric gelsolin fusion 
construct according to claim 34 , said construct 
comprising at least two immunogens. 

42. The gelsolin fusion construct according 
to claim 34, said construct comprising a vesicle that 
consists essentially of PIP or PIP 2 . 

43. The gelsolin fusion construct according 
to claim 34, wherein the vesicle comprises lipids 
selected from the group consisting of PC, PE and PS. 

44. The gelsolin fusion construct according 
to claim 34, said construct comprising a mixed lipid 
vesicle. 

45. The gelsolin fusion construct according 
to claim 34, wherein the vesicle comprises a detergent. 

46. The gelsolin fusion construct according 
to claim 34, wherein said vesicle contains a bioactive 
agent . 

47 . A lipid binding protein fusion construct 
comprising a lipid binding protein fusion polypeptide 
and a vesicle comprising a lipid capable of binding to 
said lipid binding protein fusion polypeptide, said 
construct being multimeric or hetero-multimeric. 
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48. The lipid binding protein fusion 
construct according to claim 47, wherein the lipid 
binding protein is selected from the group consisting 
of villin, severin, fragmin, profilin, cofilin, 
Cap42(a) f gCap39, Cap2 and destrin. 

49. The lipid binding protein fusion 
construct according to claim 47, wherein the lipid 
binding protein is protein kinase C, lipocortin or 
DNase I. 

50. A method for producing a multimeric or 
hetero-multimeric gelsolin fusion polypeptide 
comprising the step of transforming a unicellular host 
with a recombinant DNA molecule comprising a DNA 
sequence encoding a gelsolin fusion polypeptide 
operatively linked to an expression control sequence. 

51. A method for treating a patient having 
AIDS, ARC, HIV infection or antibodies to HIV 
comprising the step of administering to the patient a 
therapeutically effective amount of a multimeric or 
hetero-multimeric CD4-gelsolin fusion construct. 

52. The method according to claim 51 wherein 
the fusion construct comprises a toxin or an anti- 
retro viral agent. 

53. A method for identifying the presence of 
a target molecule in a sample comprising the step of 
contacting the sample with a hetero-multimeric gelsolin 
fusion construct according to claim 40. 

54. A method for identifying the presence of 
a target molecule in vivo comprising the step of 
administering to a patient an effective amount of a 
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hetero-multermic gelsolin fusion construct according to 
claim 40. 
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